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The  effort,  which  this  following  report  documents,  defines  approaches 
for  designing  highly  reliable  airborne  processors  using,  for  the  most  part, 
present  off-the-shelf  hardware.  Designs  are  intimately  described  for 
several  special  purpose  chips  to  handle  the  voting  •'nd  error  handling  task." 
of  the  fault  tolerant  design.  This  program  achieved  the  design  of  an  air¬ 
borne  processor  targeted  for  flight  control  applications  with  a  probability 
of  failure  of  Jess  than  1  x  10~9  in  a  two  hour  mission.  Throughput  is  in 
the  vicinity  of  300,000  to  400,000  operations  per  second.  Error  recovery 
would  require  a  maximum  of  10  milliseconds.  CPU  instruction  set  architecture 
is  that  of  a  presently  available  processor  so  that  very  little  additional 
support  software  need  be  developed. 

This  report  was  submitted  in  October  1978. 

Publication  of  this  report  does  not  constitute  USAF  approval  of  repo -t 
findings  or  conclusions,  and  has  been  accomplished  only  for  the  exchange 
and  stimulation  of  ideas. 


iii 


TABLE  OF  CONTENTS 


Section  Title  Page 

I.  INTRODUCTION . 1 

A.  GENERAL  DESCRIPTION . 1 

B.  APPLICATION  SUMMARY  . 2 

1.  Definition  of  Baseline  Processing  Requirements  .  .  2 

C.  DESIGN  GUIDELINES . 3 

D.  SDFTP  DESIGN . 5 

II.  CONCLUSIONS  . 12 

III.  SELF-DIAGNOSING  FAULT  TOLERANT  PROCESSOR  ...  13 

A  .  GENERAL  DESIGN  CONSIDERATION  AND 

OPERATION . 13 

B.  PROCESSOR  DESIGN  VERIFICATION  . 18 

C  .  INTERFACE  MODULE  DESIGN . 22 

1.  Seif  -Checking  Checker  Without  Mask . 22 

2.  Voter-Switch  . 24 

D.  RRU  OPERATION . 27 

1.  Architecture  Review . 27 

2.  Error  Response  Operations . 34 

3.  Self  Test  Operations . 42 

E  .  RRU  DESIGN  . 43 

1 .  Hardware  Components . 43 

2.  Description  of  RRU  Signals . 49 

3.  Maskable  SCC  Tree  . 55 

F.  RRU  PROCESSOR  SOFTWARE . 56 

1 .  Design . 56 

2.  Program  Size  Estimates . 58 

3.  Error  Response  Timing  Estimates . 58 

4.  Self  Test  Timing  Estimates . 60 

G.  CUSTOM  DEVICES . 61 

1.  Summary . 61 

2.  Self -Checking  Checker  Without  Mask . 62 

3.  Self -Checking  Checker  (S  .C  .C  .)  With  Mask  ....  65 

4.  Voter-Switch  . 66 

5.  RRU  Clock  Controller  . 70 

6.  Microsequencer . 74 

v 


TABLE  OF  CONTENTS 
(Continued) 

Section  Title  Page 

rV.  SELF  TEST  .  81 

V.  RELIABILITY  ENHANCEMENT  AND  PREDICTION  ...  82 

A.  INTRODUCTION .  82 

B.  SDFTP  RELIABILITY  MODEL .  83 

C.  FAILURE  RATE  CALCULATIONS .  87 

D.  RELIABILITY  ESTIMATES .  88 

VI.  COMPARISON  OF  SDFTP  AND  SIMPLEX  PROCESSORS  .  91 

VII.  PROGRAM  PLAN .  94 

REFERENCES  .  98 

Appendix 

I  DEFINITION  OF  BASELINE  PROCESSING 

REQUIREMENTS .  99 

II  DESIGN  GUIDELINES .  137 


vi 


LIST  OF  ILLUSTRATIONS 


Figure  No . 


Title 


1.  Bit-Slice  Microprocessor  . 7 

2.  Interface  Interconnections  . 9 

3.  Processor  (Less  RRU) . 14 

4.  Triplex  RRU  . 15 

5.  Block  Diagram  SDFTP . 16 

6.  Partition  Interface  Interconnection  (Voter  and 

Switch  and  Self -Checking  Checkers) . 17 

7.  Simplex  Prototype  Processor  Design  . 19 

8.  Voter  and  Switch  and  Self-Checking  Checkers . 22 

9.  Totally  Self -Checking  Checker  (TSC) . 23 

10.  S.C.C.  Without  Mask . 25 

11.  Scanning  Register  #1  26 

12.  Triplicated  Voter  and  Switch . 28 

13.  Processor  System  Overview . 2  9 

14.  Reconfiguration  Recovery  Unit  Overview . 30 

15.  Functional  Detail  of  Voter/Switches  and  SCC's . 32 

16.  Interpretation  of  SCC  Error  Signals . 33 

17.  System  Error  Maps . 35 

18.  Switch  Fault  Error  Reporting . 35 

19.  First  Part  of  RRU  Operational  Flow  Diagram . 36 

20.  Error  Signal  Order  on  RRU  I/O  Ports . 38 

21.  Diagnostic  Flow  Diagram  . 39 

22.  Reconfiguration  and  Masking  Flow  Diagrams . 40 

23.  Final  Part  of  RRU  Operational  Flow  Diagram . 41 

24.  Self -Test  Diagram  . 44 

25.  RRU  Off  Line  Self  Tests . 45 

26.  .  RRU  On  Line  Self  Tests  (Voter  Switch  &  Clock  Controller)  46 

27.  RRU  On  Line  Seif  Tests  (SCC) . 47 

28.  Block  Diagram  of  RRU  System . 48 


vi  i 


h 


■MtitttilBikiiiliiriaii 


LIST  OF  ILLUSTRATIONS 
(Continued) 


Figure  No.  Title  Pajje 

29.  Microprocessor  and  Combination  ROM  and  I/O  Port 

Interconnections  for  Baseline  Components . 50 

30.  Input  Signals . 51 

31.  Output  Signals  . 52 

32.  RRU  Maskable  S.C.C.  Tree . 54 

33.  S.C.C.  With  Mask . 57 

34.  RRU  Program  Size  Estimates . 58 

35.  Error  Response  Timing  Estimates  . 59 

36.  Self  Test  Timing  Estimates . 60 

37.  Totally  Self -Checking  Checker  <TSC> . 62 

38.  Self -Checking  Checker  (S  .C  .C  .)  Without  Mask . 63 

39.  Scanning  Register  Format . 64 

40.  S.C.C.  Without  Mask  Pin  Usage  . 65 

41.  S.C.C.  With  Mask  . 67 

42.  S  C.C.  With  Mask  Pin  Usage . 68 

43.  Triplicated  Voter  and  Switch . 69 

44.  Voter  and  Switch  Pin  Requirements . 71 

45.  Single  Channel  RRU  Clock  Controller . 72 

46.  RRU  Clock  Controller  Pin  Assignment . 73 

47.  Microsequencer . 75 

48.  Microsequencer  Control . 76 

49.  Tally  Counter . 77 

50.  Multiplexer,  Microprocessor  Register  and 

Incrementer . 78 

51.  Microsequencer  Stack . 79 

52.  Simplex  Processor  Probability  of  Failure . 89 

53.  SDFTP  Probability  of  Failure . 90 

54.  Proposed  Schedule  -  Self  Diagnosing  Fault  Tolerant 

Processor  Demonstrator . 97 


t 


vi  1  i 


I. 


INTRODUCTION 


A.  GENERAL  DESCRIPTION 

The  Self -Diagnosing  fault  tolerant  processor  (SDFTP)  demonstrates 
that  large  scale  integrated  (LSI)  circuit  devices  can  be  used  to  effectively 
implement  current  and  future  military  avionic  digital  system  requirements. 

In  this  way  the  predicted  advantages  of  reduced  size,  weight  and  cost  of 
LSI  implementations  are  realized.  The  design  also  shows  that  commercially 
available  LSI  devices  can  be  used  to  implement  a  large  percentage  of  the 
processor.  By  supplementing  these  devices  with  a  few  special  part  types  and 
using  self  diagnosing  and  fault -tolerant  techniques,  a  processor  has  been 
designed  with  the  required  high  fault  tolerance  and  predicted  reliability 
required  for  airborn  fly -bv-wire  flight  control  processor  application.  Many 
of  these  special  circuits  should  find  use  in  other  equipments  having  lower 
fault  tolerance  and  reliability  requirements,  to  the  extent  that  their  utiliza¬ 
tion  could  become  widespread. 

This  high  predicted  reliability  is  achieved  while  taking  into  account 
the  modifications  to  the  fault  models  of  current  digital  circuits  which  highly 
integrated  devices  require  for  high-confidence  reliability  predictions. 

The  design  employs  both  dynamic  and  static  redundancy  in  conjunction 
with  sell -diagnosing  design  techniques  to  produce  the  necessary  reliability 
enhancement  and  fault -tolerance  improvement  of  the  simplex  processor. 

Although  redundancy  is  employed  as  the  primary  means  of  achieving  fault 
tolerance,  periodic  self-testing  is  recommended  for  status  assessment  and 
initial  flight  check-out.  Because  of  this  approach  maintenance  should  be 
processor  directed  using  built  in  self-diagnoses. 

Both  bit -slice  and  monolithic  microprocessors  are  used  in  a 
complementary  fashion  that  matches  the  device  capabilities  to  the  functional 
division  of  the  processor  requirements.  The  division  of  labor  between  the 
processors  is  derived,  primarily,  from  the  performance  and  reliability 
requirements  and  the  desire  for  a  wide  range  of  applicability.  Assignment 
of  the  application  functions  to  the  bit-slice  processor  and  the  management  of  the 
fault  tolerance  requirements  to  the  computer-on-the  chip  family  of  devices 
produces  a  design  that  is  well  matched  to  its  intended  use. 

This  report  begins  with  a  summary  of  the  baseline  application  studies, 
which  led  to  the  selection  of  the  fly-by -wire  flight  control  application,  and  of 
the  design  guidelines  developed  for  the  design  of  self-diagnosing  processors 
using  LSI  components.  A  summary  of  the  self-diagnosing  fault  tolerant  (SDFTP) 
design  concludes  this  Section  .  Section  II  present  conclusions  and  recommendations. 

Section  III,  Processor  Design  is  a  detailed  description  of  the  design 
beginning  with  a  general  overview  followed  by  a  discussion  of  the  bit-slice 
processor  and  the  computer -on-chip  implementation  of  the  Reconfiguration  and 
Recovery  Unit .  Section  IV,  Processor  Self-Test ,  relates  the  self-test 
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requirements  to  some  preliminary  simulation  results  in  terms  of  test  program 
size  and  coverage  based  on  LSI  device  implementations.  Reliability  modeling 
and  failure  analysis  of  the  SDFTP  and  the  simplex  processor  are  covered  in 
Section  V,  Reliability  Enhancement  and  Prediction. 

SDFTP  and  the  simplex  processor  are  compared  in  Section  VI,  and 
Section  VTI  discusses  the  plan  for  designing,  building,  and  demonstrating  a 
sell-diagnosing,  fault -tolerant  processor.  The  two  appendices  detail  the 
baseline  requirements  study  and  the  design  guidelines. 

B.  APPLICATION  SUMMARY 

Two  airborne  applications  were  selected  as  potential  sources  of 
baseline  requirements.  The  applications  considered  were  the  fly-by-wire 
flight  control  processor  and  the  synthetic  aperture  ground  map  function  of 
an  airborne  multimode  radar  signal  processor.  Both  applications  were 
examined  to  determine  the  requirements,  beginning  with  mission  identification 
and  functional  analysis.  The  work  led  to  the  development  of  algorithm  flow 
followed  by  performance  analysis  of  representative  tasks  and  resource  sizing, 
in  terms  of  memory,  processor  speed  and  complexity  as  measured  by  the 
variety  of  operations  and  execution  speed.  These  results  are  summarized  in 
Appendix  I, 

1 .  Definition  of  Baseline  Processing  Requirements 

The  flight -control  application  is  considered  first  in  Section  A  of 
Appendix  I  since  it  represents  a  set  of  requirements  that  falls  within  the 
capabilities  of  systems  that  can  be  configured  in  a  single  programmable 
computer  structure,  and  which  can  be  implemented  today  using  existing 
LSI  devices  such  as  microprocessors  and  memories.  It  is  estimated  that  a 
high  performance  control  configured,  fly-by  wire  aircraft  would  require  less 
than  16,000  words  of  16-bit  wide  memory  and  could  be  controlled  by  a 
processor  capable  of  executing  instructions  at  a  rate  of  300  to  400  thousands 
of  operations  per  second  (KOPS).  Because  of  the  safety  requirements  of  this 
application,  quadruple  redundancy  coupled  with  software  implemented 
redundancy  management  leads  to  a  sophisticated  input  output  system  that 
connects  the  electronic  flight  control  system  to  the  aircraft  control  sensors 
and  actuators.  The  reconfiguration  approach  selected  is  desumed  to  achieve 
"failed  op-squared"  fault  tolerance  for  the  electronics. 

The  second  application,  Synthetic  Aperture  Ground  Map  Processing 
of  a  Multimode  Radar  (described  in  Appendix  I),  results  in  signal  processor 
requirements  that  are  beyond  the  capability  of  current  and  near  future  single 
conventional  microprocessor  designs.  However,  special  programmable 
pipeline  processors  and  netted  sets  of  microprocessors  are  believed  capable 
of  achieving  the  performance  required.  As  in  many  other  radar  signal 
processing  applications,  the  core  signal  processing  function  has  the  ability 
to  generate  a  floppier  frequency  analysis  of  the  radar  return.  For  this 
ground- mapping  mode  of  the  multimode  radar,  a  processing  rate  in  excess  of 
20  x  106  complex  multiplies  is  required  in  addition  to  a  signal  processing 
operation  rate  in  the  1-2  million  instructions  per  second  range.  Compared 
to  the  flight-control  application,  the  multimode  memory  requirements  are 
significantly  larger  and  are  estimated  to  fall  in  the  3.5  million  bit  range. 


This  storage  is  normally  distributed  throughout  the  signal  processor  and  must 
provide  a  high  memory  accessing  rate  capability,  which  i  a  function  of  the 
specific  radar  mode  and  signal  processor  a'-  hnecture. 

C.  DESIGN  GTTLLrrtNES 

The  effects  of  architecture,  functional  parti,  .oning,  and  module 
and  component  features  on  microprogrammable  self  diagnosing  capabilities 
of  digital  processors  were  investigated.  These  results  were  then  used  to 
create  a  set  of  design  guidelines  for  designing  self-diagnosing,  fault  -tolerant 
processors  for  both  monolithic  and  bit -slice  processors.  Appendix  II.  Design 
Guidelines,  covers  the  findings  of  the  studv  and  describes  the  techniques 
considered . 

Application  of  the  guidelines  slronglv  influenced  the  design  of  the 
SDFTP.  One  of  the  major  conclusions  is  that  architectural  considerations 
are  of  primary  importance  in  the  design  of  a  self  diagnosing  processor, 
particularly  those  making  extensive  use  of  large  scale  integrated  (LSI) 
circuits.  A  major  factor  in  this  conclusion  is  that  fault  models  of  these  self- 
diagnosing  processors  should  include  multiple  errors.  Applving  this  constraint 
in  the  evaluation  of  checking  techniques  leads,  in  general,  to  the  selection  of 
replication  as  the  preferred  coding  approach  and  for  processors  in  oarticular  . 
Periodic  testing  is  rejected  as  a  primary  approach  because  of  its  poor  detection 
of  inconsistent  errors,  incompatibility  with  real-time  requirements,  uncertain 
effectiveness  of  diagnostic  routines  in  accounting  for  multiple  errors,  and 
difficulty  in  generating  multiple  error  diagnostics. 

Utilization  of  redundancy  for  self-diagnosis  and  fault  tolerance  leads 
to  an  increase  in  t he  probability  of  failure  and  to  the  desirability  of  enhancing 
the  reliability  of  the  self-diagni  sing  processor  design.  However,  most  of  the 
redundancy  techniques  that  are  theoretically  interesting  apply  only  at  the 
component  level  and,  if  applied  as  static  redundancy,  are  effective  for  short 
times  compared  to  their  mean  -time  to  failure.  Consequently,  bit-slice 
processors  are  preferred  over  monolithic  microprocessors  because  of  their 
lower  circuit  complexity  and  greater  treedom  in  partitioning.  Technological 
considei'ations  dictate  that  redundancy  be  applied  external  to  the  device,  at  least 
for  the  immediate  future. 

Partitioning  of  the  processor  designs  is  based  on  hardware  attributes  , 
fault  error  models,  and  type  of  diagnosis.  Hardware  attributes  include  partition 
function,  structure  regularity,  size,  speed  and  communication  requirements. 
Using  these  criteria,  functional  part  it  ioning  was  determined  to  be  the  most 
effective  type.  For  a  bit -slice  processor  the  partitions  are:  processor, 
control,  memory,  input  output  and  buses .  Placement  of  the  boundaries  of  the 
partitions  were  strongly  influenced  by  the  breadth  of  communication  required 
among  the  partitions  since  the  size  of  the  interface  circuitry  was  a  direct 
function  of  the  number  of  interface  signals. 

The  guideline  summary  for  a  fault  tolerant,  self-diagnosing  bit - 
slice  microprocessor  is  presented  in  Table  I  .  The  bus  recommendation 
applies  only  to  internal  communication  within  the  processor  and  memory  and 
does  not  account  for  noise.  Memories  can  also  be  an  exception  to  the  general 
recommendation  of  redundancy  for  self-diagnosing  protection.  For  medium 
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**  SEC  'DEC  -  SINGLE  ERROR  CORRECTIONS  DOUBLE  ERROR  DETECTING  CODE 


to  large -sized  memories  conventional  coding  techniques  are  efficient 
and  effective.  Specifically,  a  single  error  correcting-double  error  detecting 
code  with  self-checking  implementation  of  the  encoders  and  decoders  is 
recommended. 

D.  SDFTP  DESIGN 

The  SDFTP  design  is  a  synergistic  combination  of  commercially 
available  bit -slice  microprocessors  and  computer-on-chip  family  of  devices. 

It  can  be  the  archtvpe  for  a  range  of  microprocessor-based  systems,  which 
meet  high  reliability  military  standards,  using  LSI  devices.  It  employs  self- 
diagnosing  techniques,  redundancy,  and  deferred  maintenance  to  achieve  a  high 
level  of  fault  tolerance  —  i.e.,  tolerate  two  faults  with  correct  operation  after 
the  second  fault.  The  design  is  developed  according  to  a  top-level  functional 
partitioning  of  the  requirements  into  two  sets.  They  are  the  application  set 
and  the  fault  tolerant  set.  The  application  set  is  implemented  by  a  replicated 
bit -slice  microprocessor  that  handles  the  execution  processes  and  communicates 
with  the  outside  world. 

The  second  set  is  concerned  with  the  detection  of  errors  and  the 
management  of  the  resources  to  achieve  the  required  level  of  fault  tolerance. 

It  is  implemented  with  a  combination  of  special  custom-designed  devices  and 
a  computer-on-chip  family  of  devices.  This  combination  of  devices  performs 
the  five  functions  of  fault  tolerance: 

1)  error  detection 

2)  error  location 

3)  failed  function  substitution 

4)  reconfiguration 

5)  recovery 

This  distribution  of  resources  follows  the  Design  Guidelines  developed 
in  the  beginning  of  this  program  (See  Appendix  II  Design  Guidelines).  It  is 
a  reflection  of  the  observation  that  architecture  is  the  most  important  factor 
in  the  design  of  a  self -diagnosing  system. 

In  highly  reliable  systems,  such  as  this  flight -control  application, 
partitioning  also  ranks  high  because  of  the  trade-off  that  must  be  achieved 
between  access,  partition  failure  rate,  and  communication  path  width. 

Functional  partitioning  of  the  bit  slice  microprocessor  resulted  in: 

1)  generality  of  application  through  microprogrammability 

2)  performance  sufficient  to  satisfy  the  flight -control  application 

3)  partition  failure  rates  that  were  sufficiently  low  that 
replication  could  achieve  the  high  reliability  requirements 
of  flight -control  applications. 

4)  partition  interfaces  that  had  reasonable  hardware  coupling 
such  that  the  necessary  interface  devices  had  relatively 
low  failure  rates  in  comparison  with  those  of  its  partition 

The  special,  custom  devices  were  instrumental  in  achieving  this 
last  result.  They  are  of  a  complexity,  structure,  and  size  that  conventional 
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tools  can  be  effectively  used  to  analyze  their  properties  and  have  failure  rates 
that  are  small  compared  to  those  of  the  partitions.  They  incorporate  the  ideas 
needed  for  effective  testability  through  the  incorporation  of  scanning  registers 
and  low  level  combinatorial  networks  between  these  registers. 

Both  the  bit -slice  microprocessor  and  (he  computer-on-chip  (COC>  are 
triplicated  for  five  reasons: 

1)  Three  copies  of  a  function  are  needed,  at  a  minimum,  if 
"failed  op2"  fault  tolerance  is  needed. 

2)  It  was  found  that  triplication  provided  sufficient  reliability 
enhancement  for  the  short  missions,  so  that  a  probability 
of  failure  of  less  than  1  x  10_9  has  been  predicted. 

3)  Triplication  is  next  to  the  cheapest  of  the  single  error 
codes  that  can  be  used  to  reliably  detect  the  multiple 
bit  errors  exhibited  by  LSI  devices. 

4)  Totally  self -checking  checkers  (TSC's>  can  be  designed  and 
applied  in  such  a  way  that  any  illegal  input  code  (nonidentical 
triad  of  signals)  as  well  as  any  checker  failure  can  be  detected. 
This  extends  the  protection  boundary  of  the  bit -slice 
processor  to  the  input  buffers  of  the  computer-on-chip. 

5)  By  employing  a  triplicated  cascade  of  these  totally  self¬ 
checking  checkers  in  a  tree  structure,  a  malfunction 
transparent  error  collection  network  can  be  implemented. 

This  network  transforms  the  multiple  self-checking  checker 
outputs  to  a  single  signal  that  can  interrupt  the  processing 
in  the  bit -slice  processors  and  can  alert  the  computer-on- 
chip  that  an  error  has  been  detected  in  the  bit -slice  processors. 

A  single  string  version  of  the  SDFTP  is  shown  in  Figure  1.  The 
bit -slice  processor  has  been  divided  into  three  functional  pai’titions  resulting 
in  three  internal  interfaces  and  one  output  (memory)  interface.  Each  inter¬ 
face  includes  a  pair  of  self-checking  checkers  (SCC)  with  triple  -  encoded  outputs 
connected  to  the  COC  input  buffers  and  the  SCC  tree.  When  an  error  occurs, 
the  COC  is  interrupted  by  the  output  of  the  SCC  tree  and  COC  reads  in  the 
error  information  from  the  interfaces  through  the  COC  Input  Buffer.  Using 
this  information  the  COC  locates  the  error  by  interface  and  device  and 
determines  whether  reconfiguration  of  the  bit -slice  processor  is  required. 

If  it  is,  the  COC  sends  a  reconfiguration  command  to  the  appropriate  bit -slice 
processor  voter-switches  located  at  each  interface,  by  means  of  its  Output 
Buffer.  After  reconfiguration  has  been  completed,  the  COC  initiates 
recovery  on  the  bit -slice  processor  by  interrupting  it  and  supplying  it  with 
an  interrupt  vector,  which,  in  most  instances,  is  the  address  of  the  last 
roll-back  point  of  the  application  process  in  execution  at  the  time  of  the 
interrupt.  If  reconfiguration  is  not  required  after  locating  the  error,  the 
COC  initiates  the  recovery  process. 

In  both  cases  the  COC  saves  the  error  information  and  updates  its 
error  history  and  SDFTP  configuration  status.  This  information  can  be  made 
available  during  flight  for  status  assessment  and,  when  not  in  flight -control 
use,  in  computer-aided  maintenance.  This  leads  to  improved  repair  and 
higher  processor  availability. 
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Figure  1.  Bit-Slice  Microprocessor 
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When  the  COC  is  not  processing  error  reports,  it  performs  self¬ 
test,  reads  out  data  to  a  status  display,  or  exercises  the  SDFTP  to  determine 

if  the  error  response  system  is  operational. 


Each  interface  between  the  bit -slice  processor  partitions  consists 
of  a  fixed  interconnection  of  devices  that  perform  the  following  functions. 


Function 


Device 


1)  Check  Partition  Output 

2)  Reconfigure  Interface  & 
Perform  Failed  Function 
Substitution 

3)  Check  Voter-Switch 
Output 


(Partition)  Output  SCC 
Voter-Switch 


(Partition)  Input  SCC 


interface  interconnections  are  shown  in  Figure  2.  Each  inter¬ 
face  has  an  output,  and  input  SCC  function.  The  output  SCC  function  is  imple¬ 
mented  by  special  custom  devices,  which  perform  the  totally  self-checking 
checker  function  by  pairs  of  signals  on  the  three  partition  outputs  (COPY  1, 
COPY  2,  COPY  3),  and  have  three  code  outputs.  The  device  also  snapshots 
each  triple  input  microcvcle  of  the  bit -slice  processor.  These  inputs  and  the 
three  TSC's  can  be  read  out  to  the  COC  on  command.  The  three  TSC  outputs 
are  also  connected  to  the  SCC  trees  shown  in  Figure  1. 


The  input  SCC  device  is  identical  to  the  output  SCC.  It  receives  inputs 
from  the  Voter -Switch  device.  These  Voter -Switch  triple  inputs  are  checked  in 
pairs  by  connecting  the  three  combinations  of  signals  to  be  checked  to  the 
inputs  of  the  three  TSC's.  The  outputs  of  the  TSC  are  also  connected  to  both 
the  COC  input  buffers  and  the  SCC  trees.  Snapshots  of  the  Voter-Switch  outputs 
are  also  captured  every  microcvcle  and  can  be  read  out  on  command  from  the 
COC. 


Reconfiguration  and  failed  function  substitution  are  the  functions 
performed  bv  the  Voter-Switch.  Each  tx’iple  of  input  bits  of  the  partition  word 
can  either  be  voted,  or  one  of  the  three  bits  can  be  connected  to  the  output 
by  a  three  -way  switch.  The  output  can  be  controlled  by  the  COC  by  sending 
a  command  that  selects  either  the  voter  or  the  switch .  If  the  switch  connection 
is  elected,  the  command  selects  one  of  the  three  positions  of  the  switch  so  that 
the  selected  Voter-Switch  input  is  connected  to  the  output. 

Two  other  special  devices  have  been  defined  to  make  the  fault 
tolerance  more  effective  and  efficient.  They  are  the  SCC  With  Mask  and  the 
Clock  Controller.  The  SCC  With  Mask  is  used  in  the  SCC  tree.  It  has  two 
features  that  distinguish  it  from  the  SCC  Without  Mask.  They  are  a  wider 
input  capability  (24  bits  versus  16  bits)  and  an  input  masking  capability.  This 
mask  capability  permits  an  input  to  be  blocked  so  that  its  signal  does  not 
contribute  to  the  output  of  the  device.  It  is  used  to  prevent  a  faulted  device 
from  causing  interrupts  after  it  has  failed. 

The  clock  controller  is  used  with  the  COC  device  to  inhibit  inad¬ 
vertent  reconfiguration  of  the  bit -slice  processor.  This  protection  is  imple¬ 
mented  by  interlocking  the  reconfiguration  command  clock  usin^  redundancy 
and  key  codes. 
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COPY  3 
COPY  2 


COPY  3 
COPY  2 


The  last  special  circuit  is  the  microsequencer  that  replaces  six 
devices  in  the  microsequencer  partition.  Since  the  SI)FTP  design  replicates 
this  partition  three  times,  the  savings  are  tripled.  In  addition  to  these 
special  considerations,  the  device  has  general  appeal  since  it  is  believed 
to  be  a  better  match,  to  most  of  the  applications  that  are  likely  to  be  encountered 
in  dedicated  military  applications,  than  presently  available  microsequencer 
devices . 


These  devices  and  techniques  result  in  a  design  of  greatly  improved 
reliability  compared  to  the  simplex  processor.  The  SDFTP  reliability  is 
calculated  to  be  four  to  five  orders  of  magnitude  better  than  the  simplex 
processor  for  self-test  coverage  in  the  range  of  0.9  to  1.0.  These  results 
are  based  on  detailed  models  of  the  SDFTP,  which  have  been  conservatively 
established.  The  associated  failure  rates  that  were  used  in  the  model  were 
computed  using  military  handbook  values  or  procedures.  Where  necessary, 
as  is  the  case  for  custom  devices  and  more  complex  commercially -available 
devices,  projections  of  the  failure  rates  were  developed.  These  projections 
were  based  on  an  extrapolation  of  the  complexity  weights  of  the  failure  rate 
equations.  For  an  assumed  self-test  coverage  factor  of  .95,  the  probability 
of  failure  was  determined  to  be  less  than  1  >'  10“^  for  a  three -hour  mission. 


Thus, compared  to  the  simplex  processor  design,  the  SDFTP  is 
significantly  superior  for  high  fault -tolerant  and  high-reliability  applications 
with  short  mission  time,  such  as  the  fly-bv-wire  electronic  flight  control 
processor.  The  SDFTP  design  can  tolerate  two  faults  and  still  provide 
undegraded  operation,  while  the  simplex  processor  has  no  tolerance  at  all. 
The  testability  improvement  has  not  been  demonstrated,  but  should  also  be 
much  superior  to  the  simplex  design  due  to  the  partitioning  and  the  insertion 
of  the  scanning  registers  Hence,  the  repair  rate  and  availability  should  be 
much  improved,  since  the  design  is  self  diagnosing  and  retains  a  run  time 
history. 


Another  of  the  major  improvements  should  be  in  the  quality  of  the 
output  be  ause  of  the  tolerance  of  the  design  to  inconsistent  errors.  Transients 
should,  in  general,  be  masked  until  the  occurrence  of  a  second  fault  in  a 
device.  The  value  of  this  improvement  is  difficult  to  quantify  but  available 
data  suggest  that  inconsistent  errors  can  have  frequencies  of  occurrence  that 
are  10  times  that  of  the  solid  errors. 

Performance  ol  this  SDFTP  in  terms  of  execution  rate  and  through¬ 
put  closely  parallels  that  of  the  simplex  processor.  The  difference  is  believed 
to  be  sufficiently  small  so  that  it  is  negligible.  Loss  of  throughput  due  to  the 
processing  of  detected  error  is  between  5  to  10  milliseconds  per  detected 
error  with  the  average  closer  to  five.  This  is  due  to  the  low  probability  of 
a  second  error  in  the  same  interface  device. 

The  price  for  this  improvement  is,  of  course,  an  increase  in 
physical  resources.  The  SDFTP  requires  about  210  devices,  which  is  about 
4.6  times  the  number  of  devices  needed  to  implement  a  simplex  design. 

Most  of  the  increase  is  due  to  the  triplication  requirement  but  the  difference 
between  the  4.6  factor  of  the  SDFTP  and  triplication,  1.5,  is  a  measure  of 
the  efficiency  of  the  self-diagnosing  self-checking  checker  and  the  dynamic 
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redundancy  control.  The  parts  type  comparison  shows  that  eight  additional 
devices  are  required,  most  of  which  are  of  the  LSI  variety.  Four  of  these 
are  special  custom  designs.  At  the  cost  of  adding  one  additional  device  the 
total  parts  count  can  be  reduced  to  about  195  devices  by  employing  the 
microsequencer  custom  device. 

For  this  commitment,  the  advantages  of  a  very  reliable,  fault - 
tolerant,  self -diagnosing  LSI  implemented  design  can  be  realized.  Increasing 
the  number  of  LSI  devices  and  the  degree  of  integration  of  the  devices  decreases 
the  size,  weight,  and  ultimately  the  cost.  This  approach  enhances  these 
advantages  while  accounting  for  the  likelihood  of  LSI  induced  multiple-bit 
errors. 
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II.  CONCLUSIONS 


A  two  fault  tolerant,  self -diagnosing,  highly  reliable  microprocessor, 
capable  of  providing  modest  throughput, has  been  designed  using  design  guide¬ 
lines  based  on  LSI  devices.  This  microprocessor's  execution  speed,  fault 
tolerance,  and  short  mission  reliability  more  than  exceeds  the  selected  fly-by- 
wire  electronic  flight  control  processor  baseline  requirements.  The  guidelines 
have  yielded  a  design  that  tolerates  the  multiple  bit  error  patterns  that  can  ac¬ 
company  LSI  device  operation.  Execution  of  the  design  has  reinforced  the  design 
guideline  conclusions:  to  wit,  that  architectural  considerations  are  of  primary 
importance  in  the  design  of  a  self -diagnosing  microprocessor.  Standardization 
of  the  "optimized"  functional  partition's  interfaces  has  permitted  the  definition 
of  just  two  custom  LSI  interface  devices  that  implement  the  totally  self -checking 
checkers  and  the  Voter-Switch  devices  for  all  of  the  partition  interfaces.  This 
implementation  has  resulted  in  an  interface  implementation  that  has  sufficiently 
high  reliability  that  it  does  not  appreciably  degrade  the  partition  reliability. 

Top-level  functional  decomposition  of  the  total  requirements  has  permitted 
a  bit-slice  processor  implementation  of  the  application  requirements  using 
commercially  available  devices .  An  optional  custom  LSI  circuit,  which  de¬ 
creases  the  total  parts  count  of  the  SDFTP  by  15,  has  been  defined.  Execution 
of  the  other  top-level  functions  by  a  computer-on-chip  implementation  results 
in  error  recovery  processing  times  of  the  order  of  5-10  milliseconds.  These 
functions  include  the  error  vector  pattern  analysis  for  fault  location  and  error 
management  for  reconfiguration  and  recovery.  The  average  execution  time  should 
be  in  the  5-7  millisecond  range,  since  the  longer  times  correspond  to  the  double 
error  in  the  same  interface  design,  which  has  a  relatively  low  probability. 

This  SDFTP  requires  a  commitment  of  about  4.6  times  the  simplex  parts 
count.  For  some  applications  this  may  be  too  expensive, and  other  designs 
should  be  considered  if  the  fault  tolerance  and/or  reliability  requirements 
are  not  as  severe  as  in  this  application.  Single  fault  tolerance  appears  to  have 
appreciably  lower  parts  count  while  retaining  most  of  the  testability  features 
of  the  more  tolerant  design. 
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III.  SELF -DIAGNOSING  FAULT  TOLERANT  PROCESSOR 


A .  GENERAL  DESIGN  CONSIDERATION  AND  OPERATION 

The  Self-Diagnosing  Fault  Tolerant  Processor  (SDFTP)  design 
consists  of  triplicated  structures.  A  triplicated  bit -slice  microprocessor, 
as  shown  in  Figure  3,  executes  the  flight  control  program  while  a  triplicated 
monolithic  microprocessor  manages  the  bit -slice  microprocessor’s  redundancy 
(see  Figure  4  )  .  This  Reconfiguration  and  Recovery  Unit  processes  the  error 
reports  generated  by  the  bit-slice  processor  checkers  to  determine  how  the  bit- 
slice  processor  should  be  configured  and  the  sequence  of  programs  that  the  bit- 
slice  processor  should  execute  to  return  to  normal  processing. 

Basically,  a  bit-slice  processor  was  selected  for  the  flight  control 
processor  because  (1)  it  provided  a  better  match  to  the  throughput  requirements 
of  the  flight  control  application,  and  (2)  its  lower  level  of  integration  permits 
smaller  redundant  structures  to  be  implemented.  The  second  attribute  per¬ 
mits  lower  failure  rate  structures  to  be  replicated,  resulting  in  significantly 
more  reliable  designs  for  short  missions.  These  smaller  structures  have 
been  selected  according  to  the  guidelines  presented  in  Appendix  II.  Accordingly, 
the  bit-slice  processor  architecture  was  divided  into  three  parts  corresponding 
to  the  three  functions  of  the  processor  hardware.  See  Figure  5.  These 
partitions  are  the  microsequencer,  the  control  store-pipeline  register,  and 
the  processor  array,  which  includes  the  microprocessor  bit  slices.  As  shown 
in  Figure  1,  there  are  three  internal  and  three  external  partition  interfaces. 
All  of  the  internal  interfaces  and  the  processor  array-memory  external  inter¬ 
face  include  Voter-Switch  devices  for  changing  the  interconnections  between  the 
pairs  of  partitions.  Included  in  each  of  these  interfaces  is  a  pair  of  checkers 
for  checking  the  output  and  input  of  the  partitions  involved  in  each  partition 
as  shown  in  Figure  6 . 

It  is  these  checkers  that  detect  any  partition  output  errors  and 
Voter-Switch  errors  at  each  interface.  In  addition,  these  checkers  possess 
the  very  significant  feature  that  they  can  detect  their  own  errors.  These  self¬ 
checking  checkers  (S.C.C.)  also  extend  the  boundary  of  protection  from  the 
partitions  and  Voter-Switches  to  the  periphery  of  the  checkers.  Each  of  the 
checkers  receives  an  input  from  each  of  the  copies  of  the  partition  or  the 
Voter-Switch  and  develop  an  error  signal  corresponding  to  a  check  of  each 
pair  of  signals.  Since  there  are  three  possible  combinations,  each  S.C.C. 
has  three  outputs  corresponding  to  these  three  pairs  of  signals. 

These  S.C.C.  error  signals  are  used  to  alert  the  RRU  that  an  error 
has  been  detected,  and  to  locate  the  partition, Voter  Switch  or  S.C.C.  that  has 
the  error.  Each  of  the  Maskable  S.C.C. 's  shown  in  Figure  4  combines  all 
of  the  error  outputs  from  the  bit -slice  microprocessor  S.C.C. ’s  and  produces 
a  dual -rail  signal  that  interrupts  its  associated  RRU  computer.  The  interrupt 
causes  each  RRU  computer  to  begin  its  diagnostic  program  to  locate  the  error. 
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Figure  6.  Partition  Interface  Interconnection 

( Voter  and  Switch  and  Self -Checking.  Checkers) 

Simultaneously,  the  S.C.C.  dual-rail  signal  interrupts  the  bit-slice  processor 
clocks  and  prevents  the  error  signal  from  propagating  beyond  the  partition 
in  which  it  originated.  With  the  bit -slice  processors  stopped,  the  RRlJ's 
have  time  to  examine  all  of  the  bit -slice  processor's  S.C.C.  output  signals, 
without  concern  that  (hey  might  change.  By  reading  the  contents  of  one  of 
the  three  snapshot  registers  in  the  S.C.C.,  each  RRU  computer  can  determine 
i he  source  of  the  error  and  whether  reconfiguration  of  the  bit  slice  processors 
is  required.  Depending  on  the  status  of  the  SDFTP  the  RRU  may  read  all  or 
onlv  a  portion  of  the  contents  of  the  snapshot  register.  In  some  cases,  the 
error  vector  associated  with  each  S.C.C.  may  be  sufficient.  In  other  cases, 
the  entire  snapshot  may  be  read  out,  including  the  current  S.C.C.  input  vectors 
and  the  immediately  preceding  microcycle  input  vectors.  By  comparing  these 
vectors,  the  failed  partition  or  device  can  be  located.  In  some  second  error 
situations,  the  comparison  may  require  that  a  self-test  program  be  run,  in 
which  case  the  comparison  includes  a  predetermined  value  that  is  known  to  be 
correct.  If  the  snapshot  values  differ  from  the  precomputed  value,  their  source 
is  inferred  to  have  failed. 

Combining  this  information  with  'he  stored  status  of  the  SDFTP,  the 
RRU  computers  determine  the  best  configuration  for  the  SDFTP  and,  if  it  is 
not  the  current  one.  determine  the  commands  that  must  be  issued  to  the 
Voter -Switches  to  produce  this  configuration.  Each  RRU  controls  one  of  three 
sets  of  voters  and  switches  in  each  Voter -Switch  device.  By  issuing  the  proper 
command,  the  RRU  computer  can  select  a  configuration  of  one  of  three  channels 
in  each  Voter-Switch  One  of  the  commands  selects  a  voter  that  produces  the 
majority  function  of  three  inputs,  corresponding  to  some  bit  of  the  interface 
data.  The  other  three  commands  select  one  of  three  switches.  Each  switch 
connects  one  of  its  three  inputs  to  an  output  for  each  bit  of  the  interface  signal. 
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The  recommended  mode  of  operation  is  to  use  the  voter  through  the 
occurrence  of  the  second  error  in  the  same  pail  it  ion  or  voter  switch  in  order 
to  retain  the  masking  capability  of  the  voter  for  as  long  as  possible.  Hence, 
the  operating  algorithm  assumes  that  the  first  error  is  an  inconsistent  one, 
since  the  expected  frequency  is  considerably  higher  than  that  for  consistent 
(solid)  faults;  it  also  assumes  that  the  error  will  not  recur  when  the  program 
is  rolled  back  to  the  last  roll-back  point.  Solid  errors  of  course  will  be 
detected  again  at  the  same  point  in  the  program  and  the  system  will  then 
be  reconfigured  using  the  switches  to  select  an  input  signal.  Each  channel 
will  use  a  different  swiich  so  that  the  outputs  are  independent  insofar  as 
possible.  The  failed  channel  will  have  to  use  one  of  the  remaining  two 
channels  that  are  still  good. 

After  an  error  has  been  confirmed  as  being  consistent,  the  Maskable 
S.C.C.  's  are  modified  so  that  the  S.C.C.  output  producing  the  error  is 
rendered  ineffective  in  controlling  the  RRU  computer  and  clock  interrupt 
signals.  This  is  achieved  by  allowing  the  RRU  computers  to  generate  a  mask 
that  blocks  out  only  the  signal  inputs  produced  by  (he  error  to  the  Maskable 
S.C.C. 's. 


Once  reconfiguration  is  completed,  recovery  is  initiated.  For  all 
of  the  first  errors  in  a  device  or  partition,  recovery  consists  of  turning  the  bit- 
slice  processor  clocks  on  and  vectoring  the  bit -slice  processors  to  the  last 
rollback  point  in  the  executing  task.  The  same  procedure  is  followed  for  all 
second  faults  that  do  not  occur  in  the  same  device  or  partition.  However, 
when  the  error  occurs  in  the  same  partition  or  the  same  part  of  the  Voter- 
Switch,  the  recovery  process  requires  the  execution  of  a  self-test  program  to 
identify  which  of  two  partitions  is  faulted.  If  the  self-test  program  does  not 
result  in  the  detection  of  an  error,  the  bit-slice  processor  automatically  falls 
through  the  last  roll-back  point .  If  another  error  located  in  the  same  place 
is  detected,  the  SDFTP  is  reconfigured  and.  then,  the  bit -slice  processors  are 
vectored  back  to  'he  last  roll  -back  point  by  the  RRU  computers. 

When  the  RRU  computers  are  not  processing  bit -slice  processor 
S.C.C  error  reports,  tliev  are  executing  self-test  and  cross-checking  each 
other.  It  the  RRU  Maskable  s.C  .(  .  fails,  the  associated  RRU  computer 
:i'a  itches  to  one  of  the  other  two  Maskable  S.C.C.  inputs.  In  some  instances, 
this  mil  be  ineffective  and  the  RRU  must  reconfigure  itself  just  as  it  must  when 
an  RRU  computer  tails.  I'o  prevent  inadvertent  changing  of  the  Voter-Switch 
commands,  t lie  shift  clocks  associated  with  these  commands  are  interlocked 
by  a  Clock  Controller  device  that  requires  a  key  (code  word)  and  or  the 
majority  of  the  RRU  computers  to  command  a  clock  signal. 

B.  PROCESSOR  DESIGN  VERIFICATION 

The  bit  -slice  processor  design  used  in  the  SDFTP  was  verified  via 
the  implementation  and  testing  of  a  prototype.  This  simplex  prototype 
processor  design  included  one  copy  of  the  microsequencer,  control  store- 
pipeline  register,  and  processor  array  partitions,  as  shown  in  Figure  7, 

Here,  the  demarcation  of  the  partition  boundaries  are  the  dotted  lines.  The 
processor  has  an  instruction  repertoire  of  over  100  instructions,  listed  in 
Table  II.  Combining  a  one  microsecond  cycle  time  memory  with  the  simplex 
processor  yields  a  computer  that  can  execute  the  flight  control  programs 
within  the  hard  deadlines.  For  typical  program  mixes,  the  processing  rate  is 
300,000  -  500,000  operations  per  second. 
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TABLE  II 


ALPHABETICAL  LISTING  OF  MCP-701A  INSTRUCTIONS 


1 

Instruction 

1 

Mnemonic 

instruction  Inscription 

Format 

Opcode  [ 

UTN 

Return  (rom  Subroutine 

< 

2  EXX 

HINT 

Return  (rmn  lntcrrujit  Routine 

4 

22XX  j 

sun 

Subtract  Double  from  Propram  Memory 

1 

54  XX 

suns 

Subtract  I> >xi t>ie  from  Scratchpad  Memory 

■> 

CbXX 

sasr 

Sol  Scratchpad  Word  Hits  Specified 

2 

74  X  X  ' 

SHU 

Subtract  Program  Memorv  Word  trom  UR 

! 

52  XX  i 

SIUJS 

Subtract  Scratchpad  Memory  Word  from  UR 

lit ‘XX  ! 

SIE 

Skip  ff  Procram  Memory  Word  is  Equal  to  UH 

* 

bOXX  | 

SKI 

Skip  11  Propram  Memory  Word  is  Greater  than  I'll 

5  EXX  i 

Sll. 

Skip  If  Procram  Memory  Word  is  l.ess  than  UR 

1 

G2XX 

Sll.ll 

Sid  Indicator  (1  eft  Byte)  -  Immediate 

•1 

1DXX  j 

sir  n 

Set  Indicator  (Ripht  Hyte)  -  Immediate 

•! 

IE  XX  I 

SISE 

Skip  If  Scratchpad  Memory  Word  is  Equal  to  lilt 

2 

1  )HXX 

S1SG 

Skip  It  Scratchpad  Memory  Word  is  Greater  than  I'H 

2 

I>1  XX 

sist. 

Skip  If  Scratchpad  Memory  Word  is  Less  than  UH 

2 

dcx  x  j 

SKl.li 

Skip  On  Indicator  (Left  Byte)  -  Immediate 

4 

mxx 

SKR 

SkipOn  IX'vieo  Heady 

2 

E  4  X  X 

sKim 

Skip  On  Indicator  (Ripht  Byte)  -  Immediate 

■i 

lt.'XX 

SKSB 

Skip  On  Scratchpad  Word  Hits  Specified 

i 

7CXX 

Si.'/ 

Shift  I'H  Loll  -  Enter  Zeros 

2UX  X 

SI. ZD 

Shift  Double  1  eft  -  Enter  Zeros 

A 

2*1  y.x 

SI.'/.X 

Shill  Double  1  eft  ffv  XC  -  Enter  Zeros 

■i 

21  XX 

1  SHC 

Shift  lift  Ripht  Circular 

i 

2  \  X  X 

SHCD 

Slufl  Double  Ripht  Circular 

■} 

2H.XX 

SHS 

Shift  LR  Hi  phi  -  Repeat  Sipn 

i 

24  XX 

SUSD 

Shift  Double  Hipht  -  Repeat  Sipn 

* 

2  5  XX 

SRSX 

Sinfl  Double  Hipht  by  XC- Repeat  Sipn 

4 

2CXX 

SHZ 

Shift  lilt  Iliplit- Enter  Zeros 

•1 

!  2txx 

SR  ZD 

Shift  Double  Hipht  -  Enter  Zeros 

4 

27.X  X 

t:  xccui  ii/ii 
1  mu' 


l./i 

5.0 

3.0 


3.5 
2.  75 
2.0 

1.5 


2.0  -  2.2 
2.0  2.2  i 

2.0  -  2.2  i 

..25  ; 

1.25  I 

1.75  -  2.0 
1.75-2.0  ; 

1.75  -  2.0 
1.25-1.5 
1.75-2.0 
1.25  -  1  3 


1 .  25  .  .  2!.m) 


1.2.5 

1.5. 

1.25 

‘  .  1!  ■>*!))  ; 
:ii  : 

.  J  nl 
*  ,  2  .  !  0  1 

l  .  y,  ■> 

1 .  2  5 

4  .  i’  '■!  f<  )  | 

1.25 

♦  .  2!>  n  I 

1.5. 

,  2:>(n)  ! 

1.25 

iu  i 

1.25 

♦  .'j'l'iii  ( 

. 

Insi  ruction 

E  sec  id  i 

Mnemonic 

Instruction  Description 

Fur  mi.iI 

Opcode 

Turn 

STAS 

Store  XA  in  Scratchpad  Memory 

2 

AdXX 

1.75 

STBS 

Store  XII  in  Scratchpad  Memory 

2 

A4XX 

1.75 

STCS 

Store  XC  in  Scratchpad  Memory 

2 

AKXX 

1 . 7'.> 

STDS 

Sb»rc  Double  in  Scratchpad  Memory 

2 

IH'XX 

3.0 

STL  S 

Store  f.R  in  Scratchpad  Memory 

2 

mix  x 

1 .  7.5 

ST1J 

Sole  UH  in  Propram  Memory 

i 

4CXX 

.  o 

STCS 

Store  UR  in  Scratchpad  Memory 

2 

OiXX 

1.75 

TSU 

Transfer  Sll  to  UH 

•1 

l  :lx  x 

1  .  25 

TUS 

Transfer  UH  to  SR 

1 

14XX 

1.25 

X12A 

Exrhanpc  UR  and  XA 

4 

10XX 

1.75 

X1JB 

Exclianpe  UH  and  Xll 

4 

11XX 

1 .  75 

XUC 

Excnanpe  UK  and  XC 

4 

12XX 

1 . 75 

XUL 

Exchanpe  UR  and  J  R 

4 

01'  XX 

1  .  75 

ZRD 

Zero  Double  (UR  and  LR) 

4 

15XX 

1.5 

20 


TABLE  II  -  (Continued) 


Instruction 

K  V  e  .*  :Mi 

Mnemonic 

Instruction  Description 

Format 

Opcode 

! 

ANSI) 

Absolute  Value  of  Double  Repister 

4 

1AXX 

1.20  -  2.20 

ANSI' 

Absolute  Value  of  Pit  • 

4 

If  XX  | 

1.2',  -  1  .  7 ., 

A  NIVA 

Add  to  XA  (Right  Byte)  -  Immediate 

4 

Of  XX 

1.25 

ADIill 

Add  to  XI)  (Ripht  Byte)  -  Immediate 

4 

ODXX 

I  .  25 

a  one 

Add  to  XC  (Ripht  Nyte)  -  Immediate 

4 

01)  XX 

1.25 

•\nut. 

Add  to  PR  (Ripht  Nvte)  -  Immediate 

4 

OBXX 

1  .  20 

ADRU 

Add  to  I'll  (Ripht  Nvte)  -  Immediate 

4 

f'A  XX 

1.25 

AND 

Add  Double*  from  Program  Memory 

at)  XX 

3 .  b  1 

ADDS 

Adit  Double  from  Scratchpad  Memory 

2 

uoxx 

2 .  a  1 

AD  MS 

Add  Double  to  Scratchpad  Memory 

2 

HUXX 

•1.25  i 

ADD 

Add  to  Pit  from  Propram  Momory 

1 

4EXX 

2.0  ; 

ADI’S 

Add  to  Pit  from  Scratchpad  Memory 

o 

ACXX 

:  i.5 

AMS 

Add  PR  to  Scratchpad  Memory 

2 

111  X  X 

2.5  ; 

cusp 

j  Clear  Scratchpad  Word  Pits  Specified 

3 

7HXX 

l  2.75 

I'll  N 

i  Clear  Indicator  (l  eft  Hvt<A-Imme<hate 

4 

1  V  XX 

1.25  i 

nun 

|  Clear  Indicator  (Rigid  nvte)  -  Immediate 

4 

20  X  X 

1  1.25 

cm 

Clear  Device  Controller 

2 

P  K  X  X 

i  1.20  | 

CI.IU 

1  Clear  Interrupt  Specified 

4 

23  XX 

1 .  25 

CPU)  j 

Complement  1  x«ublc  Register 

4 

lbXX 

I.  75  -  2.0 

CPI.U 

Complement  PR 

4 

1  "XX 

1.0  ; 

1)1  V 

Divide  Dntibh*  bv  Program  Memory  Word 

1 

5HXX 

111. 75  i 

1)1  vs 

Divide  Double  In  Scratchpad  Memory  Word 

2 

C'HXX 

10.  5 

nssz 

Deere  merit  and  Skip  if  Scratchpad  Word  is  Zero 

2 

1  uyx 

2.5-2.75  i 

[■'  N’ltl, 

Pliable  Device  to  Interrupt 

2 

P<  XX 

1.2  l 

I.NMf) 

Inhibit  Device  from  Interruptinc 

2 

PC' XX 

1.25  : 

IN\ 

Invert  PR 

4 

DiXX 

1.25 

•  UNT 

lump  to  Service  Interrupt 

4 

21  XX 

K.d 

•  IM  P 

i  .tump  Pnconditional 

1 

1  1.4  XX 

1.5 

•IM  PI 

!  .lump  Pnconditional.  Indirect 

1 

td. XX 

!  2.0 

JMS 

j  .lump  to  Subroutine 

I 

fitlXX 

•J  MSI 

.lump  to  Subroutine ,  Indirect 

1 

fiAXX 

j  i 

.... 

Instruction 

Execution  | 

Mnemonic 

Instruction  Description 

Format 

Opcode  J 

Time 

JSNS 

dump  After  Interrupt  Sense 

2 

F4XX 

2.75  ] 

1.A1.B 

Poad  XA  (Peft  Bylei  -  Immediate 

4 

02  XX 

1.25  . 

I.ARlt 

Load  XA  (Ripht  Bvte)  -  Immediate 

4 

07  XX 

1  .  w  •  ) 

I. Ill  It 

Poad  XB  (Peft  Bvie)  -  Immediate 

4 

03XX 

pnr  n 

Poad  XB  (Ripht  IWte)  -  Immediate 

4 

OH  XX 

1  .  2  a 

PC  PH 

Poad  XC  (l.eft  Byte)  -  Immediate 

4 

04  XX 

1.2a 

t.CRB 

Poad  XC  (Ripht  Byte)  -  Immediate 

4 

OOXX 

PDA 

Load  XA  from  Program  Memory 

1 

•10  X  X 

2.0 

I. DAS 

Poad  XA  from  Scratchpad  Memory 

2 

HbXX 

1.5 

2.0 

I.DH 

Poad  XI)  from  Pro;'. ram  Memory 

1 

4HXX 

LDNS 

Poad  XBfrom  Scratchpad  Memory 

HO  XX 

1 .  :< 

2.0 

I.  DC 

Poad  XC  from  Propram  Memory 

1 

4  A  X  X 

l.DCS 

Poad  XC  from  Scratchpad  Memory 

2 

rJ0  XX 

i . 

I.DD 

Load  Double  from  Program  Memory 

i 

44  XX 

LDDS 

Poad  Double  from  Scratchpad  Momory 

2 

H4XX 

y* 

DDL 

Poad  PR  from  Propram  Memory 

1 

42XX 

2.0 

I. DPS 

Poad  PR  from  Scratchpad  Memory 

2 

HO  XX 

1 .  a 

2.0  1 

I  .DU 

Load  UR  front  Program  Memory 

1 

40XX 

PDUS 

Poad  UR  from  Scratchpad  Memory 

2 

7CXX 

1  .  5 

PI. PH 

Poad  PR  (Peft  Nvte!  -  Immediate 

4 

01  XX 

1  .  25 

pi.RN 

Poad  PR  (Ripht  Byte)  -  Immediate 

4 

ooxx 

1.25 

lupn 

Poad  P'R  (Peft  Byte)  -  Immediate 

4 

ooxx 

1.25 

Pi'll  B 

Poad  PR  (Ripht  Byte)  -  Immi  diate 

4 

05  XX 

1.25 

MPY 

Multiply  lilt  by  Propram  Memory  Word 

1 

5(iXX 

h.O 

MPYS 

NDP 

Multiply  HR  by  Scratchpad  Memory  Word 

And  to  PR  from  Propram  Memory 

2 

1 

C4XX 

5  A  XX 

5.75  j 

2.0  , 

MU'S 

And  lo  NR  from  Scratchpad  Memory 

2 

CCXX 

1.5 

2.0  .  .  25(n  1 

NR  M 

Normalize  Double 

4 

'2PXX 

ORP 

Or  to  NR  from  Propram  Memory 

1 

5CXX 

?•" 

or  ns 

OR  to  NR  from  Scratchpad  Memory 

D0XX 

1 . 5 
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C.  INTERFACE  MODULE  DESIGN 


In  between  the  triplicated  partitions  of  the  bit-slice  processor  are 
the  partition  interface  modules  as  shown  in  Figure  8.  These  modules  are  a 
combination  of  partition  output  SCC's,  Voter-Switches,  and  partition  input 
SCC's.  The  function  of  the  SCC's  is  to  produce  error  patterns  so  that  the  bit  - 
slice  processors  can  be  properly  reconfigured  to  switch  out  a  failed  partition 
or  part  of  an  interface  module.  Using  the  Voter-Switch  devices,  the  faulted 
device  can  be  isolated  while  maintaining  as  high  a  level  of  redundancy  as 
possible  throughout  the  design.  The  usage  of  these  devices  in  the  interfaces 
is  described  next.  For  more  detailed  information  see  Section  III-G,  Custom 
Devices . 


VOTE  It  &  SWITCH 

•  ~  X:Vi  GATES 

*  tit  PIN 

Figure  8  Voter  and  Switch  and  Self -Checking 
Checkers 


1 .  Self-Checking  Checker  Without  Mask 

The  S.C.C.  detects  errors  ir  its  inputs  as  well  as  any  faults  in  the 
checker  itself.  In  a  self-checking  circuit  such  as  the  S.C.C.  the  inputs  and 
outputs  are  encoded  so  that  any  assumed  fault  within  the  circuit,  or  nny  non- 
code  input,  produces  a  non-code  output  ter  at  least  one  of  the  normally  occurring 
inputs.  The  validity  of  the  output  code  words  is  checked  by  a  check  circuit  that 
produces  valid/invalid  indications.  The  check  circuit  is  also  designed  so 
that  it  is  self -checking  and  produces  the  same  "invalid"  indication  for  a  fault 
in  the  check  circuit  as  for  non-code  inputs.  So  called  "totally  self-checking" 
(TSC)  circuits,  which  are  one  of  the  classes  of  self-checking  circuits,  were 
proposed  by  D.A.  Anderson  This  class  of  circuit  has  the  property  that 

an  assumed  fault  never  causes  an  erroneous  code  output  in  addition  to  the 
attributes  already  cited.  (For  a  rigorous  definition  of  TSCC  see  reference  4.) 
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In  this  flight -control  application,  the  fault  model  selected  is  the 
so-called  "stuck -at  model,"  where  hardware  failures  in  a  circuit  are  modeled 
as  some  logic  gate  input  or  logic  gate  output  lines  stuck-at-1  (S-a-1)  or  stuck 
at-0  (S-a-0).  Faults  are  said  to  occur  when  one  or  more  lines  become  S-a-1 
or  S-a-0.  Thus,  when  a  single  line  is  stuck,  a  single  fault  is  said  to  have 
occurred.  Multiple  faults,  where  more  than  one  line  is  stuck,  may  occur.  If 
one  or  more  lines  become  stuck  at  the  same  logic  value,  i.e.  1  or  0,  a  uni¬ 
directional  fault  is  said  to  have  occurred. 

Once  the  fault  model  has  been  elected,  the  checker  code  can  be 
determined.  The  choice  of  the  code  can  strongly  affect  the  oroperty  of  the 
checkers  and  functional  blocks.  Constant  weight  and  unordered  codes  have 
been  suggested  (2,3)  for  the  design  of  totally  self-checking  circuits.  These 
codes  are  used  because  the  structure  of  the  functional  blocks  and  the  assumed 
faults  always  lead  to  unidirectional  errors  (4).  Unordered  codes  have  the 
property  that  they  can  detect  any  unidirectional  error.  A  consequence  of  the 
use  of  unordered  codes  is  that  totally  self-checking  checkers  must  be  built  to 
check  them . 

The  code  selected  for  the  bit  -slice  processor  partitions  is  the 
triplication  code,  which  is  then  partitioned  into  a  trio  of  duplication  codes 
for  input  to  the  S.C.C.  The  code  selected  for  the  S.C.C.  is  the  dual-rail  code. 
Thus,  within  the  S.C.C.,  one  of  the  signals  of  each  of  the  pairs  of  signals  of 
the  triplicated  partition  signals  is  inverted.  In  the  S.C.C.  device  up  to  16 
triple  inputs  can  be  checked.  Within  the  S.C.C. 's,  each  of  the  three  nairs 
of  signals  is  checked  by  a  self-checking  tree  consisting  of  4-out-of-8  T.S.C's 
as  shown  in  Figure  9.  The  output  of  each  tree  is  a  dual -rail  signal,  in 
which  an  error  is  indicated  by  a  0-0  or  1-1  combination.  These  three  tree 
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outputs  are  combined  with  snapshots  of  the  three  current  input  signals,  and 
the  three  inputs  from  the  previous  microcycle.  These  signals  are  saved  in 
three  separate  scanning  registers  as  shown  in  Figure  10.  These  three 
snapshots  are  identical  except  for  the  error  field.  The  6-bit  error  vector 
of  each  snapshot  is  rotated  with  respect  to  the  other  two  snapshot  error  fields 
so  that  the  dual-rail  output  of  each  of  the  three  T.S.C.  trees  is  available 
externally.  This  permits  a  check  on  all  input  signals  to  the  S.C.C.  The 
field  definitions  of  the  scanning  register  are  shown  in  Figure  11. 

By  clocking  the  inputs  and  outputs  of  the  three  T.S.C.  's  into  the 
three  snapshot  registers  every  microcycle,  the  current  status  of  the  interface 
is  captured.  The  current  input  (i)  is  shifted  down  to  the  previous  microcvcle 
input  field  (i-1)  on  the  occurrence  of  the  next  clock  cycle.  Thus,  where  an 
error  is  detected  by  one  of  the  S.C.C. 's  it  causes  the  RRU  Maskable  S.C.C. 
to  indicate  an  error,  which  stops  the  processor  clocks  that  clock  the  snapshots 
into  the  snapshot  register.  Thus,  the  cause  of  the  error  and  its  location  are 
available  for  diagnosis  by  the  RRU.  Each  snapshot  register  can  be  clocked  by 
a  different  processor  clock  if  desired. 

Similarly,  each  snapshot  register  can  be  examined  by  clocking  out  the 
contents.  In  the  SDFTP,  the  RRU  shifts  out  the  snapshot  by  supplying  a  shift 
clock  to  each  register.  Hence,  the  snapshots  can  be  read  independently  by  the 
RRU  computers.  This  occurs  after  the  error  outputs  have  been  processed  to 
turn  off  the  processor  clocks,  and  the  RRU  computers  have  been  alerted  that 
an  error  has  been  detected  by  being  interrupted.  The  contents  of  the  scanning 
register  can  be  repeated  since  the  information  is  recirculated  as  it  is  read  out, 
if  an  error  is  detected  during  the  first  read-out.  Under  some  conditions  only 
part  of  the  snapshot,  such  as  the  error  vector,  may  be  read  out . 

The  outputs  of  the  S.C.C.  are  connected  to  the  RRU  Maskable 
S.C.C.  trees  and  to  the  Input  Buffers  of  the  RRU  computer.  Each  of  these 
S.C.C.  trees  combines  all  of  the  S.C.C.  signals  of  the  devices,  located  at 
at  the  interface,  into  a  single  dual-rail  signal,  which  interrupts  its  associated 
RRU  computer  and  bit -slice  processor  clock.  The  other  connection  of  the 
interface  S.C.C  .  involves  only  one  third  of  all  of  the  S.C  .C  .  outputs.  Each 
RRU  computer  is  connected  to  only  one  of  the  three  scanning  registers  of  each 
of  the  S.C  .C  .'s.  By  connecting  the  second  and  third  scanning  registers  of 
each  interface  S.C.C.  to  different  RRU  computers,  all  three  computers  have 
complete  but  independent  access  to  the  error  vectors  and  S.C.C.  inputs.  By 
shifting  these  scanning  registers  out  under  the  shift  clock  control,  as  much  of 
the  snapshot  can  be  obtained  as  needed  for  diagnosis. 

Since  each  S.C.C.  without  mask  can  handle  16  or  less  inputs,  all 
of  the  four  interfaces  require  just  two  devices,  with  the  exception  of  the  control 
store-processor  array  interface.  This  interface  requires  six  devices  since 
there  are  less  than  48  signals  that  must  be  protected  at  the  partition  output 
and  inputs.  Thus  a  total  of  12  devices  are  needed  per  SDFTP. 

2.  Voter-Switch 

After  a  detected  error  has  been  diagnosed  Die  connections  between 
the  bit -slice  processor  may  need  to  be  changed,  and  (he  status  of  the  Voter- 
Switch  devices  at  each  interface,  which  permit  this  automatic  reconfiguration 
under  the  control  of  an  RRU  computer,  mav  be  altered.  Each  Voter-Switch 
device  incorporates  a  triplicated  set  of  circuits  for  nine  triplicated  signals. 

Each  circuit  consists  of  a  three  input  voter  and  set  of  throe  switches.  Both 
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the  voter  and  the  switches  receive  the  same  triplicatea  signals  as  shown  in 
Figure  12.  The  voter  computes  the  majority  function  of  the  three  inputs 
while  the  switch  selects  one  of  the  three  inputs  for  transmission  to  the  out¬ 
put.  Only  one  or  the  other  is  connected  to  the  output  at  any  one  time  and  the 
connection  is  controlled  by  a  command  register  that  can  be  set  externally. 

In  the  case  of  the  SDFTP,  the  RRU  computer  transmits  the  command  from  its 
output  buffer  to  the  Voter -Switch  in  serial  form.  The  information  is  then  clocked 
into  the  command  register  by  the  command  clock,  which  is  also  supplied  by  the 
RRU  computer.  Each  of  the  three  sets  of  nine  input  signals  is  controlled  by  a 
separate  command  register.  For  the  SDFTP  application  each  command  register 
is  controlled  by  a  separate  RRU  computer  and  can  be  loaded  independently. 

In  Figure  12  a  single  bit  slice  is  shown  to  the  right  of  the  dotted  line  and  the 
three  command  registers  are  shown  to  the  left  of  the  dotted  line.  Each 
command  register  controls  one  of  the  three  voter -switch  circuits. 

The  voter-or- switch  (VOS)  bit  of  the  command  dictates  which 
device,  voter  or  switch,  will  provide  the  output  .  When  VOS  is  a  "1".  the  voters 
are  selected  while  ”0"  selects  the  switches.  The  command  bits  Cl,  C2,  C3 
determine  which  of  the  switches  is  connected  to  the  input.  A  ’1"  in  anv  of 
these  three  bit  positions  selects  the  corresponding  switch:  i.e.  Cl  =  l,the 
top  of  the  three  switches.  Only  one  switch  should  be  used  to  maintain  the 
independence  of  the  channels. 

A  total  of  10  of  these  devices  is  required  to  provide  this  function 
fc  '■  all  of  the  SDFTP  interfaces.  The  processor  array -microsequencer  and 
microsequencer  control  store  interface  each  require  one  Voter-Switch  device. 
The  processor  array-memory  interface  requires  two  devices  while  the  control 
store-processor  array  requires  six. 

D.  RRU  OPERATION 

1 .  Archit e clure  Review 

First,  the  architecture  of  the  error  reporting  system  will  be 
reviewed  because  a  good  understanding  of  the  various  parts  of  the  system  and 
the  way  they  interrelate  will  make  it  easier  to  understand  the  overall  operation 
of  the  RRU ! 

Figure  13  is  a  simplified  overview  of  the  processor  and  the  SCO's. 
Each  of  these  components  is  discussed  in  detail  in  Section  III  G.  The  basic 
processor  partitions,  including  the  microsequencer  (/as)  control  store  (CS) 
and  processor  array  (PA)  are  shown  separated  by  voter  'switches  labeled  V  S. 
The  control  store  and  processor  array  require  more  than  one  voter  switch 
to  handle  all  their  output  leads.  The  three  processor  channels  are  illustrated 
with  a  three-dimensional  effect. 

Errors  are  reported  by  SCC’s.  A  net  of  three  TSC’s  labeled  "out" 
is  connected  to  the  output  of  each  processor  partition.  Likewise,  the  TSC 's 
labeled  "In”  are  connected  to  the  input  of  each  processor  partition.  These 
circuits  check  for  errors  at  the  input  and  output  of  each  processor  partition 
and  provide  sufficient  information  to  locate  errors  in  either  the  partitions 
or  the  switches.  The  error  signals  from  these  partition  SCC's  are  merged 
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Figure  13.  Processor  System  Overview 

together  via  special  masked  SCO's,  which  provide  interrupt  signals  for  each 
of  three  RRU's  (reconfiguration  recovery  units).  The  interrupt  signal  is 
generated  whenever  any  unmasked  error  is  obtained  from  the  system. 

The  interrupt  signals  cause  the  RRU's  shown  in  Figure  14  to  go  to 
an  error  processing  routine,  which  determines  the  cause  of  the  error  and 
issues  the  appropriate  commands  to  switch  data  signals  around  the  fault  in  an 
optimum  manner. 

The  R RU  processors  are  implemented  with  monolithic  microprocessors 
with  appropriate  memory  and  I/O  circuits.  They  are  described  more  fully  in 
Section  III  E  .  The  RRU  processors  are  operated  independently  for  the  most 
part.  Thev  are  loosely  synchronized  in  that  they  can  communicate  with  each 
other  and  the  maintenance  panel  by  handshake  signals.  They  also  perform  a  type 
of  synchronized  control  when  turning  on  the  processor  clocks  such  that  the 
processors  are  locked  together.  The  interrupts  that  initiate  the  error  response 
processing  are  also  provided  in  triplicate  and  each  RRU  processor  can  select 
any  one  of  the  three. 
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It  is  important  to  understand,  in  some  detail  both  the  manner  in 
which  the  processor  system  can  be  recon  fissured  with  the  voter  switches  and 
the  way  in  which  the  error  signals  are  derived.  The  RL  '  system  is  the  link 
from  the  error  signals  to  the  voter  switch  controls.  The  number  2  voter 
switch  and  the  2  OUT  and  2  IN  SCO’s  are  shown  in  functional  detail  in 
Figure  15.  Kaoh  voter  switch  receives  all  three  channel  signals  coming  from 
the  microsequencer.  The  voter  switch,  outputs  can  be  derived  by  a  vote  of 
all  three  inputs  or  tliev  can  be  switched  directlv  to  any  one  ot  the  three  inputs. 
Initiallv,  the  voter  is  used  so  that  a  single  nocrosequencer  failure  is  masked 
by  the  voter.  Should  two  microsequencers  fail,  then  each  switch  is  set  to  use 
the  one  remaining  microsequencer  as  an  input.  Each  of  the  following  control 

stores  (CSl  would  thus  receive  a  correct  input 

The  number  2  OUT  SCC  is  connected  to  the  outputs  of  the  three 
microsequencers  as  shown  in  Figure  15.  Likewise,  the  number  2  IN  SCC  is 
connected  to  the  inputs  of  the  three  control  stores.  The  e,  error  signal  is 
derived  by  comparing  channels  1  and  2:  and  e2  is  derived  from  channels 
2  and  3  and  e^  from  channels  3  and  1.  The  number  2  SCC  IN  error  signals, 

e  '  e9  and  e^’,  are  derived  in  a  similar  fashion  from  the  1‘,  2'  and  3' 
signals . 

The  interpretation  of  these  error  signals  is  as  shown  in  Figure  16. 
Failures  associated  with  each  error  pattern  are  listed  here.  The  interpretations 
of  the  first  fault  is  relatively  straightforward  as  indicated  by  the  entries  in  the 
column  titled  Single  Faults.  The  only  entry  that  might  seem  confusing  is  the 
interrupt  error.  This  is  detected  wh  a  an  interrupt  yields  no  error  signals 
in  any  of  the  SCC's.  Since  the  SCC  outpu  are  only  examined  after  an 
interrupt  has  occurred,  the  only  conclusion  that  can  be  reached  is  that  the 
interrupt  circuitry  has  failed.  The  RRU  circumvents  the  problem  by  switching 
its  interrupt  input  to  one  of  the  other  two  interrupt  signals. 

Consider,  now,  the  problem  when  a  second  fault  is  sensed  in  a 
single  output  SCC.  There  are  several  possibilities  for  each  of  the  double 
error  patterns.  The  RRU’s  now  must  resort  to  either  running  processor 
diagnostics  or  performing  a  series  of  sequential  tests  to  determine  the  cause 
of  the  second  fault . 

The  operation  of  the  system,  where  two  faults  occur  in  the  same 
SCC  partition  is  best  described  by  an  example.  Assume  that  the  first  fault 
is  the  failure  of  the  channel  1  microsequencer.  The  second  fault  is  (hen 
assumed  to  be  the  failure  of  the  channel  2  microsequencer.  This  can  result 
in  a  situation  where  all  three  error  signals  indicate  errors.  As  seen  in 
Figure  16  even  though  channel  1  had  been  identified  as  the  first  error  it 
may  be  impossible  to  determine  whether  the  second  fault  is  due  to  a  failure  in 
channel  2  or  3.  Thus,  a  processor  diagnostic  is  run  to  determine  which 
channel  is  at  fault. 

Another  example  is  the  failure  of  SCC  1  followed  by  the  failure  of 
microsequencer  1,  having  an  error  pattern  consisting  of  e^  and  eg.  However, 
it  is  not  clear  whether  the  second  fault  is  due  to  a  SCC  3  or  a  channel  1 
failure,  since  either  of  these  failures  in  combination  with  the  original  SCC  1 
failure  yields  the  same  error  pattern.  Here,  a  sequential  test  is  used. 
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The  SCC  3  error  is  masked  and  the  voter  switch  is  switched  to  the  channel  1 
input  and  the  main  processor  is  restarted.  If  there  is  no  error  interrupt  then 
the  second  failure  was,  in  actuality,  an  SCC  3  error.  But  if  the  failure  was 
microsequencer  1,  the  input  SCC  following  the  voter  'switch  will  indicate  a 
channel  1  error,  since  channel  1  is  being  compared  to  the  other  voter  outputs 
at  this  point.  Upon  detecting  a  channel  1  error  the  switch  is  either  thrown  back 
to  the  voter  position  or  to  channel  2  or  3. 

It  should  be  emphasized  that  these  double-fault  problems  occur 
only  when  two  faults  are  reported  by  a  single  SCC.  Two  faults  each  occurring 
in  different  SCC’s  are  processed  as  successive  single  faults. 

Two  simultaneous  faults  in  a  single  SCC  are  recognized,  and 
diagnostics  are  used  to  determine  any  failed  channels.  The  remaining  faults, 
if  any,  are  assumed  to  be  SCC  errors. 

The  error  interpretation  for  input  SCC's  is  similar,  with  the  ex¬ 
ception  that  the  channel  errors  are  now  caused  by  voter  'switches.  A  failure 
here  results  in  switching  to  a  good  function  input.  This  causes  the  error 
pattern  to  revert  to  the  no-error  case,  and  a  second  error  of  this  type  will 
cause  a  single  fault  pattern  and  be  easily  detected  and  identified. 

Maps  of  the  system  errors  for  three  system  states  are  shown  in 
Figure  17.  The  map  is  laid  out  in  the  same  order  as  the  system  components 
shown  in  Figure  13.  The  blank  boxes  represent  the  SCC’s  and  the  other 
boxes  are  the  other  elements  in  the  system  as  labeled.  The  first  map 
illustrates  the  no-error  conditions.  The  second  map  indicates  e^  and  e„ 
errors  in  SCC  2  OUT,  which  is  a  channel  1  jis  fault.  The  third  map  adds  a 
channel  2  voter /switch  fault  to  the  previous  fault,  resulting  in  e.  and  e~ 
errors  in  SCC  4  IN.  1 

The  voter  'switches  each  switch  nine  bits  of  data,  whereas  the  SCC’s 
report  errors  in  16 -bit  slices.  This  results  in  some  overlap  of  switchable 
slices  between  SCC’s  for  the  design  described.  This  situation  is  depicted 
in  Figure  18.  An  example  is  SCC  3b,  which  monitors  parts  of  slices  3b, 

3c  and  3d.  There  is  no  way  to  determine  which  9-bit  slice  caused  the  error 
by  examining  the  error  signals  alone.  Currently,  the  system  assumes  that 
all  three  slices  are  bad  and  acts  accordingly.  It  would  be  possible  to  distinguish 
between  these  slices  if  the  actual  data  are  read  from  the  SCC  registers. 
However,  this  would  take  considerable  time  and  there  seems  to  be  no  advantage 
to  isolating  the  fault  to  one  slice  for  reconfiguration  purposes.  The  exact 
locations  of  the  fault  for  maintenance  purposes  would  be  accomplished  by  the 
RRU  self-test  programs. 

2 .  Error  Response  Operations 

The  initial  part  of  the  RRU  operational  flow  diagram,  which  performs 
the  error  interpretation  and  x’econfigures  the  system  via  the  switches,  is 
shown  in  Figure  19.  When  the  main  processors  are  running,  the  RRU’s  are 
either  idle  or  running  self  test,  waiting  for  an  error  interrupt.  When  the 
interrupt  occurs,  the  errors  are  read  by  the  RRU  processors  via  their  I  O 
ports.  The  e  signals  from  all  the  SCC's  are  available  to  each  RRU.  These 
are  the  same  signals  that  are  merged  together  with  the  masked  SCC’s  to 
provide  the  interrupt  signals. 
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The  order  of  the  e  signals  on  the  RRU  ports  for  one  of  the  RRU 
computers  is  depicted  in  Figure  20.  All  eight  bits  of  port  1  and  four  bits 
of  port  2  are  used.  The  signals  are  obtained  by  shifting  them  out  of  the 
SCC  registers,  starting  with  e^  and  ending  with  e^.  The  order  shown  in 
Figure  20  is  the  order  for  the  channel -1  RRU  processor.  The  channel -2 
processor  receives  the  e„  signals  followed  by  e^  and  then  e. .  The  channel -3 
processor  receives  e^  first  followed  by  e^  and  eg,  in  that  order.  The  orders 
cannot  be  the  same  because  the  same  output  pins  must  supply  the  masked 
SCC's  with  all  three  error  signals  in  parallel.  The  RRU  must  send  a  clock 
to  the  SCC's  each  time  it  wants  a  new  set  of  error  signals.  The  signals  on 
one  port  are  all  read  at  the  same  time.  An  error  is  recognized  if  any  e., 
e.  pair  are  the  same.  The  normal  flow  then  continues  to  the  box  in  Figure  19 
labeled  "save  new  error  from  lowest  numbered  SCC".  The  logic  must  reject 
all  error  signals  from  previous  faults  that  have  been  processed.  These  errors 
remain  in  the  system  and  may  be  read  each  time  an  interrupt  is  generated. 

(They  are  prevented  from  causing  interrupts  by  masking  in  the  merging  SC^'s.  ) 
The  logic  then  selects  the  error  bit  combinations  from  the  lowest  numbeieo  SCC. 
This  ensures  that  only  one  new  error  will  be  processed  at  a  time. 

The  next  step  in  the  flow  diagram  of  Figure  19  is  to  determine 
whether  this  error  report  already  exists  in  a  fault  list  of  previous  detected 
faults.  If  it  is  not  in  the  list  the  error  is  entered  in  the  list  and  processing 
is  resumed.  If  this  error  has  already  occurred  once  it  will  be  in  the  list.  A 
particular  error  is  not  designated  a  fault  until  the  second  time  it  occurs.  This 
prevents  transient  errors  from  reconfiguring  the  system  needlessly. 

Once  a  fault  is  detected  it  is  tested  to  see  if  it  comes  from  an  input 
or  output  SCC.  If  it  is  an  input  error  some  processing  is  always  needed.  It  may 
involve  running  a  diagnostic  or  just  reconfiguring.  If  the  error  is  from  an 
output  SCC  it  is  checked  to  determine  whether  it  is  the  first  or  second  fault  in 
the  SCC  partition.  A  single  partition  fault  is  generally  masked  by  the  following 
voter  and  no  reconfiguration  is  needed.  Second  faults  are  generally  more 
complicated  and  require  running  some  sort  of  diagnostic.  The  exact  logic  path 
followed  by  a  given  fault  is  determined  by  the  particular  combination  of  faults. 
There  are  several  special  f'ases  that  do  not  follow  this  diagram  exactly. 

A  flow  diagram  showing  the  processor  diagnostic  procedure  in  more 
detail  is  given  in  Figure  21.  The  diagnostic  is  selected  by  the  RRU  sending 
a  diagnostic  starting  address  to  the  processor.  The  RRU  then  sets  the 
interrupt  circuits  into  a  mode  that  only  allows  interrupts  to  be  generated  at 
the  end  of  the  diagnostic.  If  this  were  not  done  the  processors  would  be 
stopped  as  soon  as  the  first  error  appeared  in  the  diagnostic  run.  The  RRU 
processors,  however,  only  know  the  correct  results  obtained  at  the  end  of 
the  diagnostic.  Therefore,  the  diagnostic  must  run  to  completion  to  permit 
the  RRU  to  determine  which  channel  has  failed.  The  diagnostic  is  started  by 
enabling  the  main  processor  clocks;  it  then  runs  until  the  results  at  the  end 
of  a  diagnostic  are  in  error  at  which  time  an  interrupt  is  generated.  Flags 
are  tested  to  determine  which  routines  should  be  run  to  interpret  the  test 
results . 

The  major  steps  in  the  reconfiguration  and  masking  processes  are 
shown  in  Figure  22.  The  first  step  is  to  generate  the  proper  code  word 
to  enable  the  clock  signals  needed  to  change  the  voter  switch  state.  The 
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requirement  of  generating  a  code  to  turn  on  the  clocks  makes  it  quite  unlikely 
that  a  runaway  RRU  could  accidently  gain  access  to  the  switch  controls.  The 
next  step  in  the  reconfiguration  is  to  create  a  switch  command  message  from 

the  fault  information.  This  command  is  then  sent  to  the  switches. 

In  addition  to  setting  switches  the  new  errors  must  be  masked  from 
the  interrupt  circuits;  otherwise  they  will  continue  to  generate  interrupts.  This 
is  done  in  a  manner  similar  to  the  way  in  which  the  switches  were  changed.  A 
new  mask  image  is  created  from  the  fault  information  and  sent  to  the  masked 
SCC's.  Again,  a  clock  signal  is  required  to  send  the  mask  to  the  SCC's.  This 
clock  signal  is  enabled  by  the  same  code  word  that  enabled  the  switch  command 
clock. 


The  main  error  processing  flow  diagram  is  completed  in  Figure  23. 
Once  reconfiguration  is  complete,  or  it  is  determined  that  no  reconfiguration  is 
needed,  the  processor  is  sent  the  address  of  the  rollback  routine.  The  roll¬ 
back  routine  causes  the  processor  to  resume  execution  of  the  applications 
program  that  was  interrupted  by  the  error  The  execution  starts  at  a  prior 
point,  where  the  state  of  the  processors  and  all  interim  computations  were 
saved.  Thus,  the  computation  is  restarted  at  a  point  that  was  unaffected  bv 
the  error;  the  execution  is  restarted  by  signals  that  start  the  processor 
clocks  in  synchronism. 


Figure  23.  Final  Part  of  RRU  Operational 
Flow  Diagram 
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The  RRU's  now  exchange  system  status  information  off-line  without 
interfering  with  the  processor  operation.  The  main  purpose  of  this  exchange 
is  a  form  of  self  test  to  ensure  that  each  processor  reached  the  same  result. 
Should  there  be  any  disagreement,  the  bad  RRU  processor  will  be  declared 
down.  This  status  exchange  is  also  needed  when  failures  occur  in  the 
interrupt  system,  since  these  errors  are  not  normally  monitored  by  the  other 
RRU  processors  and  a  status  exchange  is  t lie  only  wav  to  update  the  status  in  the 
other  RRU  processors. 

The  results  of  reconfiguration  are  then  reported  to  the  maintainence 
system,  which  accepts  status  reports  from  each  of  the  processors.  If  one 
differs  it  assumes  the  disagreeing  RRU  processor  is  bad.  At  this  point  the 
RRU’s  return  to  the  idle  or  sell -test  loop  thev  were  executing  when  the 
original  interrupt  occurred. 

3.  Self  Test  Operations 

The  self  tests  described  in  this  section  are  a  form  of  built-in  tests. 
They  may  be  used  at  any  time  to  determine  the  operating  condition  of  either 
the  processors  or  the  RRU  system.  This  is  m  contrast  to  reacting  only  to 
errors  detected  during  application  program  execution,  as  described  in  the 
previous  section.  The  self-test  operation  does  require  the  cooperation  of  the 
bit -slice  processors  for  all  tests  except  the  RRU  processor  diagnostic.  Thus, 
the  processors  will  not  be  available  for  execution  of  application  programs 
during  most  of  the  testing.  The  self  tests  require  many  iterations  of  test 
sequences  with  resulting  long  execution  times.  The  entire  sequence  of  tests, 
i t  performed  at  one  time,  could  last  for  several  seconds. 

The  self-test  capability  would  be  used  to  determine  the  status  of  the 
system,  before  and  after  a  mission,  to  ensure  that  there  are  no  failed 
components.  A  second  use  for  the  self  tests  is  to  augment  the  error  response 
operations.  The  error  response  operation  cannot  directly  detect  errors 
such  as  a  partial  RRU  processor  failure.  It  will  diagnose  such  failures  as  a 
control  failure  in  either  a  vote  switch  or  as  an  SCC  failure.  The  system 
will,  however,  lie  Cdrrectlv  recontigured  but  it  may  take  a  longer  time  to 
determine  the  correct  reconfiguration  than  it  would  have  if  the  error  had 
previously  been  detected  by  self  test.  An  example  best  illustrates  this  point. 
Assume  that  the  line  in  the  I  O  port  that  controls  a  given  voter  switch  module 
is  broken  and  undiscovered.  Then  the  voter  fails  as  a  second  fault.  The  error 
response  system  will  then  attempt  to  switch  to  Channel  1  to  bypass  the  voter. 
However,  since  the  RRU  is  broken,  nothing  happens.  As  soon  as  the  processors 
begins  execution  the  voter  causes  another  error  interrupt.  The  error  response 
system  now  assumes  the  channel  -  1  switch  is  bad  and  tries  to  switch  to  channel  2. 
This  sequence  of  restarts  continues  until  all  channels  have  been  tried  and  the 
entire  voter  switch  is  declared  down.  If  the  self  tests  had  been  run,  the  bad 
switch  control  operation  would  have  been  previously  detected  so  that,  when 
the  voter  failed,  the  system  could  immediately  declare  the  entire  voter  switch 
down  and  act  accordingly. 

The  price  for  this  added  speed  is  a  loss  of  processor  time  and 
additional  program  memory  in  the  RRU's.  The  loss  of  processor  time  could 
be  alleviated  if  there  was  a  method  of  running  the  self  tests  periodically  during 
the  mission,  when  the  processors  were  not  needed  for  executing  application 
software  This  would,  of  course,  depend  on  the  exact  nature  of  a  given  mission. 


A  flow  diagram  for  the  self  tests  is  given  in  Figure  24.  Each  of 
the  tests  is  summarized  in  Figures  25  through  27.  The  first  test  is  the 
RRU  processor  self-test.  This  test  does  not  require  the  main  processor 
and  so  may  be  run  concurrently  with  applications  software.  It  is  an  instruction 
diagnostic  that  makes  simple  computations  and  compares  the  results  to  pre¬ 
computed  values.  Two  bits  on  the  I  O  ports  are  provided  for  testing  1  O 
instructions. 

The  next  test  is  an  exchange  of  data  via  the  intercommunication  I  O 
lines.  It  tests  these  lines  as  welt  as  provide  an  opportunity  for  the  RRU 
processors  to  update  the  status  of  the  system  in  each  RRU  processor.  This 
routine  also  runs  concurrently  with  the  main  processor  application  software. 

The  voter  switch  test,  the  RRU  clock  controller  test,  and  the  SCC 
error  detection  and  masking  test  all  require  the  cooperation  of  the  main 
processors.  These  tests  are  similar  in  that  they  attempt  to  exercise  all 
parts  of  the  RRU  system  by  using  the  main  processors  to  provide  known  test 
vectors  at  all  partition  interfaces.  Manipulation  of  these  test  vectors  coupled 
with  the  reaction  of  the  RRU  system  provides  a  means  of  diagnosing  RRU 
failures.  These  tests  all  rely  on  the  error  response  system  to  sense  the 
state  of  the  errors  in  order  to  make  maximum  use  of  that  software.  These  tests 
art*  quite  involved  and  lengthy  and  are  split  into  segments  so  that  only  one  seg¬ 
ment  need  be  executed  at  a  time,  thus  minimizing  the  length  of  time  that  the  main 
processor  must  be  tied  upperforming  self  test. 

E.  RRU  DESIGN 

1 .  Hardware  Components 

A  block  diagram  ot  the  RRU  system  is  shown  in  Figure  28.  The 
error  reporting  SCC's,  while  not  shown  m  this  diagram,  are  considered  a 
part  of  the  RRU.  The  functions  and  circuit  details  of  the  error  reporting  SCC's, 
the  masked  SCC's,  the  voter  switches  and  the  main  processors  are  described 
in  detail  elsewhere  in  this  report.  The  only  description  of  these  modules  in  this 
section  will  be  to  describe  their  interface  with  the  RRU  processors T 

The  central  control  of  the  RRU  lies  in  the  RRU  processors.  Their 
primary  function  is  to  read  the  state  of  the  system  as  inputs  through  their  input 
ports  and  then  compute  the  appropriate  control  signals  to  be  output  via  their 
output  ports . 

In  addition  to  the  RRU  processors  and  its  I/O  ports  there  are  tripli¬ 
cated  RRU  clock  controllers  and  main  processor  controllers .  The  RRU  clock 
controller' s  function  is  to  limit  access  to  the  control  of  the  error  reporting 
SCC  registers,  the  masked  SCC  mask  registers  and  the  voter,  switch  switching 
commands.  It  does  this  by  gating  all  the  clocks  entering  these  functions  from 
the  RRU  processors.  The  reason  for  providing  this  function  is  to  reduce  the 
probability  that  a  failed  RRU  processor  could  inadvertently  fail  the  entire  sys¬ 
tem  by  randomly  manipulating  the  SCC's  and  voter  switches.  There  are  three 
ways  of  enabling  these  clocks,  all  of  which  require  a  different  code  word  to  be 
calculated  by  the  RRU.  The  lirst  method  uses  the  voted  outputs  of  the  inter¬ 
rupts  from  all  three  channels.  Thus,  anytime  a  normal  error  interrupt  occurs 
the  clocks  can  be  enabled  with  a  code  word.  The  second  method  is  to  enable 
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Figure  24.  Self-Test  Diagram 
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Figure  26.  RRU  On  Line  Self  Tests  (Voter  Switch  V  dock  Controller' 
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Figure  27.  RRU  On  Line  Seif  Tests  (SCO 


ERROR  REPORTING  MAIN 

SCO'S  PROCESSORS  VOTER/SWITCHES 


them  when  running  tests  by  voting  test  control  signals  from  all  three  RRU 
processors.  This,  together  with  the  HRU  processor-generated  code  word  and 
a  test  signal  from  the  main  processor,  will  allow  generation  of  clocks  for 
testing.  The  last  method  enables  the  clocks  when  two  of  the  RRU  processors 
are  down.  It  does  this  with  just  the  good  processor's  interrupt  signal  and  a 
code  word.  The  RRU  clock  controller  is  described  in  detail  in  Section  III.G. 

The  main  bit-slice  processor  controller  is  used  to  start  and  stop  the 
main  processors.  Its  function  includes  turning  off  the  bit-slice  processor 
clocks  on  the  receipt  of  an  error  interrupt  from  the  SCC’s.  The  controls 
necessary  to  force  the  processors  to  start  at  any  one  of  a  number  of  test 
routines  or  to  go  to  the  rollback  point  are  also  provided.  These  controllers 
consist  of  a  few  gates  and  flip-flops. 

The  RRU  processors  are  implemented  with  monolithic  microproces¬ 
sors.  The  baseline  system  used  Intel  8048  single-chip  microprocessors,  which 
have  their  own  clock  oscillators  and  require  only  the  addition  of  a  crystal  and 
two  capacitors.  It  also  contains  1000  bytes  of  ROM,  64  bytes  of  RAM  and  a 
timer  on  the  same  chip.  The  8748  is  another  version  of  this  microprocessor, 
which  has  1000  bytes  of  EPROM  rather  than  ROM.  The  8048  microprocessor 
is  intended  for  control  purposes  and  its  I/O  port  capability  is  easily  expanded 
to  large  numbers  of  I/O  pins.  It  has  a  cycle  time  of  2.5  microseconds.  Most 
instructions  require  only  one  cycle  but  some  do  require  two  cycles.  It  is  a 
40-pin  device . 

The  I/O  ports  used  m  the  baseline  are  Intel  835d  combination  ROM 
and  I'O  ports.  They  contain  2000  bytes  of  ROM  and  two  8  bit  I,  O  ports,  which 
can  be  configured  on  a  bit  basis  to  be  either  input  or  output  pins.  Data  placed 
on  die  outputs  can  also  be  read  so  they  form  a  convenient  means  for  determining 
which  data  is  being  outputted.  There  is  an  EPROM  version  designated  the  8755. 

The  8355  ROM  and  I/O  circuits  are  connected  in  a  straightforward 
fashion  to  the  8048.  All  of  the  necessary  signals  to  drive  the  8355’s  are  present 
on  the  8048.  The  8355’s  require  very  little  drive  current  and,  therefore,  line 
driving  buffers  are  not  needed.  An  example  of  an  8048  connected  to  one  8355 
is  shown  in  Figure  29. 

Alternatives  to  the  8048  microprocessor  are  eidier  the  Intel  8044  or 
8085.  The  8049  contains  twice  the  amount  of  ROM  and  RAM  but  is  otherwise 
identical  to  the  8048.  Hie  8085  is  a  version  of  the  8080  microprocessor  that 
has  a  self-contained  oscillator  and  tinier  and  is  compatible  with  the  8355  memory 
and  I/O  device.  Both  the  8049  and  8085  are  available  in  faster  versions  than 
die  8048.  The  instruction  repertoire  for  the  8085  is  different  from  the  8048  so 
it  is  difficult  to  determine  how  much  faster  it  might  be  in  this  application.  Both 
of  these  microprocessors  are  candidates  for  use  if  greater  speed  is  desired. 

2  .  Description  of  RRU  Signals 

The  details  of  the  RRU  design  are  probably  best  understood  by 
describing  the  function  of  each  signal  going  to  and  from  the  RRU  processor. 

The  input  signals  are  listed  in  Figure  30  and  Uie  output  signals  are  listed  in 
Figure 31.  Signal  positions  in  the  list  will  be  designated  by  an  I  for  input 
signals  or  an  O  for  output  signals,  followed  by  a  letter  designating  the  major 
heading  and  then  by  its  number  under  the  heading.  Therefore,  die  number  1 
output  SSC  signal  is  designated  IA  l . 
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A.  SCC  ZRROR/BEGISTER  OITPUTS  D.  I/O  INSTRUCTION  TEST 


1. 

1 

-  OUT 

1 .  I/O  TEST  IN 

2. 

1 

-  IN 

3 

2 

-  OUT 

4. 

2 

-  IN 

E  .  PROCESSOR  CLOCK  CONTROLLER 

5. 

3a 

-  OUT 

6. 

3a 

-  IN 

1.  TESTA 

7. 

3b 

-  OUT 

2.  TEST  B 

8. 

3b 

-  IN 

9. 

3c 

-  OUT 

10. 

3c 

-  IN 

11. 

4a 

-  OUT 

12. 

4b 

-  IN 

B.  INTERCOMMUNICATIONS 

1 .  READY  A  IN 

2.  READY  B  IN 

3.  READY  P  IN 

4.  SEND/RECEIVE  A  IN 

5.  SEND /RECEIVE  B  IN 

6.  SEND/RECEIVE  P  IN 

7  .  DATA  A  IN 

8.  DATA  B  IN 

C.  INTERRUPT 

1.  INT  1 

2.  INT  2 

3.  INT  3 


Figure  30.  Input  Signals 
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A. 


S.C  .C  .  MASKS 


D. 


M1SC  .  CONTROL 


1. 

j  MASK  1 

1 . 

SCC  REGISTER  SELECT 

2 . 

2. 

CLEAR 

3. 

4. 

|  MASK  2 

3. 

L 

0  INST  .  TEST 

5 . 

! MASK  3 

6 . 

i 

E  .  RRU  CLOCK  CONTROLLER 

VOTER  SWITCH  CONTROL 

1. 

it  . 

CODE 

3. 

WORD 

1 . 

V  S  1 

4. 

2. 

V  S  2 

5. 

CODE  WORD  CLOCK 

3. 

V  S  3a 

6. 

MASK  CLOCK 

4. 

V  S  3b 

7. 

V/S  1  -3f  CLOCK 

5 . 

V  S  3c 

8. 

V/S  4a  r4b  CLOCK 

6 . 

V  S  3d 

9. 

SCC  l-3b  CLOCK 

7. 

V,  S  3e 

10. 

SCC  3c -4b  CLOCK 

8. 

V  S  3f 

11. 

RRU  TEST  CONTROL 

9. 

V  S  4a 

10. 

V  S  4b 

F.  MAIN  PROCESSOR  CONTROL 


IN  T  ERCOM  M  UNIC  ATIONS 

1. 

2. 

INTERRUPT 

1. 

READY  A  OUT 

3. 

1  VECTOR 

2  . 

READY  B OU I 

4. 

3. 

READY  P  OUT 

5. 

4. 

SEND  RECEIVE  OUT 

6. 

RRU  INTERRUPT  BLOCK 

5 . 

DATA  OU  T 

7. 

CONTINUE 

8. 

A  CHANNEL  BYPASS 

9. 

B  CHANNEL  BYPASS 

Figure  31 .  Output  Signals 
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The  first  set  of  signals  to  be  discussed  are  the  RRU  clock  controller 
signals.  As  mentioned  previously  the  function  of  the  clock  controller  is  to 
limit  inadvertent  access  to  those  modules  controlled  by  the  RRU  processor. 

The  signals  OE1  through  4  are  the  lines  over  which  the  RRU  processor  sends 
a  code  word  to  the  controller.  Four  parallel  lines  are  used  to  speed  up  the 
sending  of  the  code  word.  The  code  word  requires  a  clock  to  enter  it  into  the 
controller.  This  clock  is  signal  OE5  and  is  simply  another  output  pin,  which 
is  alternately  set  to  one  and  zero  by  software  in  the  RRU  processor. 

The  next  five  signals  are  the  actual  clocks  that  are  used  to  clock 
masks  to  the  masked  SCC’s.  change  the  settings  of  the  voter  switches  and  read 
the  ’rror  SCC  registers.  The  last  signal  OE11  is  a  signal  that  is  voted  with 
corresponding  signals  from  the  other  RRU  processors  to  enable  the  clocks  for 
test  purposes . 

The  interrupt  signals  1C  1-3  are  the  three  interrupt  signals  generated 
by  the  masked  SCC’s.  These  signals  are  provided  for  testing  purposes.  The 
interrupt  system  is  shown  in  Figure  32.  Each  set  of  masked  SCC’s  generates 
an  interrupt  signal.  All  three  signals  are  supplied  to  three  switches  each  of 
which  supplies  an  interrupt  to  one  of  the  RRU  processors.  Should  tins  signal 
fail  such  that  it  creates  an  interrupt,  the  interrupted  processor  will  perform 
the  normal  error  routines  and  find  no  errors  to  account  for  tins  interrupt.  It 

then  samples  the  INT  signal  that  was  passing  through  its  interrupt  switch.  If  |' 

that  signal  is  high  then  the  problem  is  in  the  Masked  SCC's  and  the  switch  can 

lie  changed  to  use  one  of  the  other  interrupt  signals.  If  the  INT  signal  was  low 

then  the  error  is  between  the  masked  SCC’s  and  the  RRU  processor, and  nothing 

more  can  be  done  to  reconfigure  the  system;  then  that  RRU  will  be  declared 

down.  The  other  two  INT  signals  are  provided  for  those  cases  when  they  are 

the  signals  used  to  interrupt  the  RRU  processor.  The  opposite  problem  --  no 

interrupt  is  generated  when  it  should  be  --  is  detected  when  status  is  exchanged 

between  processors.  At  this  point  the  affected  processor  can  test  the  INT  lines 

and  reconfigure  as  previously  described. 

Figure  32  also  shows  signals  going  from  the  RRU  processors  to  the 
masked  SCC’s.  These  are  die  mask  signals  OAl-6.  A  mask  image  is  sent 
serially  m  two-rail  fashion  over  two  lines.  The  data  are  entered  by  means  of 
a  mask  clock,  OE6,  from  the  clock  controller. 

There  is  one  line  from  each  of  the  error  reporting  SCC’s  going  to 
each  RRU  processor.  They  are  input  signals  IA1-12.  The  errors  as  well  as 
the  register  contents  are  read  from  these  lines  by  clocking  diem  with  the 
appropriate  clock  signal,  OE9  or  10,  from  the  RRU  clock  controller.  There 
is  an  additional  control,  OD1,  called  SCC  register  select.  This  connects  either 
the  i-l  register  or  the  i  registers  to  the  error  SCC  output.  This  is  important 
when  a  control  store  error  has  occurred.  The  SCC  register  at  the  control  store 
output  must  output  the  current  or  i  value  since  it  is  this  value  that  must  be 
checked  for  errors.  However,  the  SCC  register  at  the  input  to  the  control  store 
must  also  be  read  to  obtain  die  address  to  be  used  to  look  up  the  correct  output 
m  the  control  store  copy  in  Uie  RRU  processor.  The  address  that  caused  the  , 

error  was  the  previous  or  i-l  value,  which  is  also  stored  in  the  registers.  If  ( 

die  SCC  output  was  not  switched  to  the  i-l  register  hv  the  register  select  tut 
the  RRU  processor  would  have  to  read  through  the  4B  tuts  ot  the  i  value  first, 
and  this  would  slow  the  processing  considerably  . 
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The  voter  switch  commands,  to  enable  either  the  voter  or  individual 
channel  inputs,  are  sent  to  the  voter/switches  via  signals  OBI -10.  The  com¬ 
mand  is  sent  serially  out  to  the  appropriate  line  by  clocking  the  voter  switches 
with  either  the  OE7  or  8  clock  signals,  which  pass  through  the  RRU  clock 
controller . 

There  are  several  input  and  output  lines  used  to  communicate  between 
the  RRU  processors  and  the  maintenance  panel.  The  input  signals  are  IB1-8 
and  the  output  signals  are  OC1-5.  The  ready-in,  readv-out  and  send  receive 
in  and  out  signals  are  used  as  handshake  signals  between  RRU  processors  and 
the  test  panel.  The  ready -in  signal  signifies  that  the  other  processor  is  ready 
to  transmit  or  receive,  depending  on  the  state  of  the  send/ receive  line.  The 
ready  out  and  send  receive  out  are  the  handshake  signals  being  sent  out  by  the 
processor.  Only  one  send/ receive  out  signal  is  needed  since  it  can  be  bussed 
to  all  the  other  processors.  There  must  be  separate  ready-out  signals  so  that 

only  one  processor  responds  at  a  time.  The  data-in  lines  come  only  Com  the 
two  other  processors,  since  the  maintenance  panel  does  not  transmit  data.  The 
data-out  signal  is  bussed  iO  all  the  other  processors.  The  processors  normally 
stay  m  the  receive  mode  until  they  wish  to  transmit.  Should  the  processor  that 
was  to  receive  a  message  switch  to  the  transmit  state  at  the  same  time  as  die 
first  processor,  the  handshakes  will  prevent  the  transmission  from  occurring. 
Normally  the  order  of  transmissions  arc  fixed  in  the  programs  so  that  this 
problem  does  not  arise. 

The  main  processor  clock  control  uses  signals  OFl-hand  11.12. 

The  signals  Of  1  a  comprise  an  interrupt  vector,  which  is  sent  to  the  mam 
processor  to  cause  it  to  jump  to  its  next  routine.  It  is  used  to  send  the  main 
processor  either  to  a  rollback  point  or  to  a  test  routine.  The  processor  is 
then  restarted  using  the  continue  signal.  If  a  test  program  is  to  be  run.  the 
RRU  interrupt  block  signal.  OF6.  is  enabled.  This  signal  prevents  the  main 
processor  from  being  stopped,  before  the  end  of  a  diagnostic,  by  errors  earh 
m  the  diagnostic.  It  is  important  that  this  be  done  because  the  RRU  can  only 
interpret  test  values  at  the  end  of  a  diagnostic.  The  bypass  and  test  signals 
determine  whether  the  other  RRU  processors  are  failing  to  issue  a  continue 
signal,  in  winch  case  it  can  be  bypassed. 

The  I  o  lest  signals,  11)1  and  01)3.  are  used  by  the  RRU  processor 
self  -test  diagnostic  to  test  its  I  O  instructions,  by  sending  data  out  one  pin  and 
reading  via  the  oilier  pm.  It  can  accomplish  this  without  disturbing  the  data 
currenth  on  the  other  output  lines. 

The  last  signal.  01)2,  is  a  clear  signal  which  can  lie  used  by  the  RRU' 
processor  to  initialize  the  various  circuits  in  the  system. 

3.  Maskable  SCC  Tree 

The  function  of  the  Maskable  SCC  Tree  of  each  of  the  RRU  error- 
processing  channels  is  to  reduce  the  large  number  of  signals  (  12  >  to  a  single 
signal  as  shown  in  Figi  re  32.  The  SCC  outputs  of  interfaces  ©  ©  and 
Q,  (see  Figure  13  for  numbered  interface)  can  be  accommodated  In  one 
SCC  with  Mask,  while  the  SCC  outputs  of  the  interface  (3)  are  delivered 
to  the  second  SCC  with  Mask,  since  each  of  these  SCC's  can  handle  24  inputs 


or  loss  .  I'lio  outputs  ot  each  ut  these  devices  is  combined  in  a  second 
checker,  a  tour-out  -of-eight  checker  whose  dual-r.nl  output  is  converted  to 
single -rail  form  bv  an  exclusive  OH  circuit.  This  is  done  to  match  the  RRU's 
interrupt  signal  requirements  . 

These  SCO  with  Mask  devices  differ  .significantly  from  those  used 
in  the  bit -slice  processor  interfaces  in  two  ways.  One  of  these  is  the  wider 
input  capability,  as  shown  m  Figure  33  .  Another  is  the  incorporation  of  a 
masking  capability  to  block  signals  known  to  produce  error  signals  from 
descending  the  tree  and  producing  undesired  error  indications.  The  mask  is 
computed  by  each  tree’s  RRU  computer,  based  on  the  current  error  state  ot 
the  SDFTP,  and  is  usually  changed  after  the  detection  of  a  solid  error.  By 
having  the  RRU  update  the  mask  after  each  detection,  it  is  possible  for  the 
SDFTP  to  continue  operating  after  a  number  of  errors  have  been  detected, 
without  the  known  faults  continually  interrupting  the  SDFTP  every  time  a 
vector  occurs  that  exercises  one  of  these  faults. 

The  outputs  of  all  three  of  the  SCC  trees  E-OR  gates  can  be 
selected  to  interrupt  any  of  the  RRU  computers  bv  means  of  a  multiplexer  as 
shown  in  Figure  32  .  These  three  E-OR  outputs  are  made  available  to  the 
RRU  computer  by  connecting  them  to  the  RRU  computer's  input  buffers.  This 
permits  each  RRU  computer  to  isolate  failures  to  the  SCC  trees. 


F.  RRU  PROCESSOR  SOFTWARE 

1 .  Design 

The  RRU  processor  software  is  divided  into  two  parts.  Die  lirst 
part  implements  the  error  response  processing.  This  processing  was  dis¬ 
cussed  in  Section  1II.D  and  its  operational  flow  diagrams  arc  contained  m 
Figures  ID,  21,  22,  and  23.  The  second  part  implements  tin  —t  il  test  proces¬ 
sing  discussed  in  Section  HI .  D  and  is  diagrammed  m  Figure  24.  The  1  unction  of 
the  error  response  sys.  1  is  to  react  to  errors  reported  in  the  sell  -  checking 
checkers,  reconfigure  the  main  processors,  il  mressan  .  In  controlling  the 
switches,  and  report  the  fault  to  the  maintenance  sisicm,  the  sell  test  system 
provides  a  means  of  testing  portions  of  the  RRU  that  cannot  be  guaranteed  to  be 
directly  tested  by  the  self -checking  checkers.  Tins  function  is  useful  tor  built- 
in  test  functions,  to  ascertain  the  health  of  the  system  before  a  mission,  or  to 
supplement  the  error  response  system  by  providing  mlormation  that  mat  m 
some  instances  speed  up  the  error  response  processing. 

Whenever  possible,  common  routines  are  made  into  subroutines  to 
save  program  memory  space.  This  usually  results  in  slightly  longer  running 
times  so  that  only  the  longer  routines  are  made  into  subroutines. 

The  sizing  estimates  and  running  times  for  those  parts  of  the  system 
that  require  software  diagnostics  cannot  be  accurately  estimated,  for  it  is 
beyond  the  scope  of  this  work  and  is  too  early  in  the  design  to  derive  accurate 
estimates  of  coverage  (percent  of  circuitry  tested*  versus  program  size  and 
running  time.  The  attendant  problems  are  discussed  in  some  detail  m  Section 
IV  . 
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2  .  Program  Size  Estimates 

Estimates  of  the  program  sizes  for  both  the  error  response  and  seif- 
test  software  are  given  in  Figure  34.  The  error  response  software  requires 
3200  bytes  of  memory,  the  self-test  software  slightly  less,  at  3000  bytes. 
However,  it  is  not  accurately  known  how  much  coverage  is  obtained  for  this 
amount  of  code.  The  control  store  tables  in  item  3  provide  a  fourth  copy  of  the 
main  processor's  control  store  to  help  determine  which  control  store  is  still 
functioning  after  two  of  the  three  have  failed. 

The  tutal  program  storage  is  9200  bytes  .  Each  RRU  processor  has 
11,000  bytes  of  ROM.  However,  4000  bytes  are  connected  as  data  memory  to 
access  the  control  store  tables  more  efficiently.  The  control  store  tables  re¬ 
quire  only  3000  bytes,  leaving  1000  bytes  unused.  If  this  memory  were  to  be 
used  as  program  storage,  a  few  more  gates  would  be  needed  to  give  access  to 
it  as  program  memory.  Currently,  without  this  1000  bytes,  there  are  still  800 
bytes  of  spare  program  memory. 

The  self-test  software  uses  the  error  response  software  to  a  large 
extent  in  performing  tests.  Thus  the  3000  bytes  for  the  self -test  software  is, 
m  reality,  the  .  mount  of  additional  software  needed  to  perform  the  self-test 
function . 


ESTIMATED  BYTES 
PER  PROCESSOR 

1 . 

ERROR  RESPONSE 

3200 

2. 

SELF  TEST 

3000 

3. 

CONTROL  STORE  TABLES 

3000 

TOTAL, 

9200 

Figure  34.  RRU  Program  Size  Estimates 

3.  Error  Response  Timing  Estimates 

The  running  time  of  the  error  response  routines  is  largely  dependent 
on  the  exact  combination  of  failures  in  the  system  at  any  one  time,  because  of 
the  many  different  branches  m  the  program  logic.  Therefore,  the  running  time 
estimates  are  illustrated  by  five  examples  given  in  Figure  35. 

The  first  two  examples  are  the  same  as  Lliose  illustrated  in  the  error 
maps  in  Figure  17.  The  first  example  is  the  failure  of  the  number  1  microse- 
quencer  with  no  other  failures  in  the  system.  The  software  requires  each  failure 
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stimated  Time 

in  Microsc 

conds  i 

1st  2nd 

Error  Error 

Sum  of 

1st  &  2nd 

Off  Line 

Status 

Reporting 

1. 

Microsequencer  1  Failure 

1800 

2000 

3800 

16,400 

2. 

Microsequencer  1  Prior  Fault 
Plus  a  Current  Failure  in  V/S  2 
SCC  4  (Port  2) 

2400 

3400 

5800 

16,400 

3  . 

Microsequencer  1  Prior  Fault 
Plus  a  Current  Failure  in  V  S  2 
SCC  2  ( Port  1 ) 

2100 

3100 

5200 

16,400 

4. 

Microsequencer  1  Prior  Fault 
Plus  a  Current  Failure  m 
Microsequencer  2.  This  is  a 
second  fault  in  a  partition. 

2100 

8200 

10,300 

16,400 

5 . 

Microsequencer  1  Prior  Fault 

Pius  an  Interrupt  Circuit 

Failure 

1400 

2200 

3600 

16,400 

Figure  3 ;  .  Error  Response  Timing  Estimates 

to  occur  twice  before  it  is  detected  as  a  fault.  This  prev.  .its  transient  errors 
from  needlessly  reconfiguring  the  system.  Thus  a  failure  of  the  number  1 
sequencer  will  cause  the  error  response  software  to  take  1.8  milliseconds  to 
process  the  first  occurrence  of  the  error.  If  the  error  is  a  hard  failure,  the 
system  will  immediately  be  interrupted  with  the  second  error,  which  takes  two 
milliseconds  to  process.  Thus,  the  main  processors  will  have  been  off-line  for 
the  sum  of  these  two  times  as  indicated  in  the  third  column  in  Figure  35.  This 
example  requires  a  total  of  3.8  milliseconds  before  the  main  processors  are 
restarted  at  the  previous  rollback  point. 

If  this  error  were  only  a  transient  then  only  the  1.8  milliseconds  for 
the  first  error  would  be  taken  away  from  the  main  processor. 

The  RRU  then  takes  an  additional  16.4  milliseconds  to  report  the 
failure  and  checks  that  all  RRU  processors  have  reported  the  same  failure. 

This  reporting  process  is  the  same  for  all  tests. 

The  second  example  adds  a  Voter-Switch  error  to  the  number  1  mi- 
crosequoncer  failure.  The  errors  are  reported  through  a  different  port  for 
this  Voter-Switch,  slightly  affecting  the  timing.  The  time  to  process  the  two 
errors  is  5.8  milliseconds. 
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The  third  example  is  similar  to  die  second  example,  except  that  the 
Voter-Switch  error  is  reported  through  the  same  port  as  the  microsequencer 
error,  reducing  the  running  time  by  0.6  of  a  millisecond  --  to  5.2  milliseconds. 

The  fourth  example  is  an  example  of  two  failures  in  one  partition, 
which  is  more  difficult  to  handle.  The  processing  time  has  almost  doubled  here 
because  a  microsequencer  diagnostic  must  be  run  in  the  main  processor.  The 
total  time  required  here  is  10.3  milliseconds.  The  running  time,  in  this  case, 
is  largely  dependent  on  a  main  processor  diagnostic  running  time,  for  winch 
no  accurate  estimates  are  available. 

The  last  example  illustrates  an  interrupt  circuit  failure,  where  the 
interrupt  is  stuck  in  the  "ON"  state.  This  failure  requires  3.6  milliseconds. 

Thus  the  typical  error -handling  times  range  from  3.6  to  10.3  milli¬ 
seconds  . 

4.  Sjelf  Test  Timing  Estimates 

The  self  test  timing  estimates  are  given  in  Figure  36.  The  RRU  pro¬ 
cessor  tests  which  hike  23.7  milliseconds,  are  run  without  disturbing  the  main 
processor  operation.  The  cooperation  of  the  main  processor  is  required  to 
carry  out  the  bulk  of  the  testing  on  all  the  switches,  self -checking  checkers  and 
other  miscellaneous  circuitry.  The  running  time  for  these  tests  is  given  as 
item  two  in  Figure  36.  The  complete  set  of  tests  is  estimated  to  run  for  three 
seconds.  This  is  not  objectionable  for  built-in  test  requirements,  but  would  be 
prohibited  during  a  mission.  These  tests  were  made  more  useful  by  dividing 
them  into  small  segments  each  of  which  could  be  run  at  different  times.  The 
typical  length  of  these  segments  is  given  in  line  3  as  3.6  milliseconds.  This 
places  each  segment  in  the  same  time  range  as  the  error  response  routine 
times.  Again,  these  estimates  depend  heavily  on  main  processor  diagnostics 
for  which  there  are  no  accurate  timing  estimates  available. 


The  last  item  is  the  16.4  milliseconds  needed  for  reporting  each 
failure  as  it  is  detected. 


Estimated 

Time  In 

Microseconds 

l . 

RRU  Processor  Tests 
(Main  Processors  on  Line) 

23,700 

2. 

Tests  Involving  Main  Processor 

3,147,000  (3  seconds) 

3. 

Single  Segment  From  i 2  Above 

3,600 

4. 

Status  Reporting  (Main  Processors 
on  Line) 

16,400 

Figure  36.  Self  Test  Timing  Estimates 
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G.  CUSTOM  DEVICES 


1.  Summary 

A  number  of  circuits  have  been  designed  to  facilitate  the  implemen¬ 
tation  of  a  self -diagnosing  processor  using  dynamic  redundancy.  Most  of  the 
circuits  are  important  in  terms  of  the  efficient  realization  of  the  concept  and 
are  predicated  on  large-scale  integrated  (LSI)  circuit  embodiment  of  a  number 
of  functions.  The  microsequencer  device  is  desirable  in  any  processor  in 
that  it  reduces  parts  count  and  the  number  of  devices  in  any  processor.  In 
the  SDFTP,  it  is  important  because  the  microsequencer  partition  is  replicated 
three  times  in  the  design  and,  thus,  the  parts  reduction  is  increased  by  a 
factor  of  three. 

The  other  four  circuits  are  integral  to  the  implementation  of  the 
self-diagnosis  and  fault  tolerant  concepts.  Two  self-checking  circuits,  which 
detect  errors  not  only  in  their  inputs  but  within  the  circuits  themselves,  are 
defined.  They  are  the  Self-Checking  Checker  (S  .  C.  C. )  Without  Mask  and 
With  Mask.  These  LSI  circuits  are  used  between  the  bit-slice  processor  parti¬ 
tions  and  in  the  RRU  respectively.  The  S.C.C.  Without  Mask  is  the  realization 
of  three  totally  self -checking  checkers  (T.S.C  .  >  and  three  scanning  registers. 
The  S.C  .C  .  With  Mask  accepts  a  larger  input  vector  that  can  be  masked,  so  that 
selected  inputs  are  rendered  ineffective  in  the  code  check  of  the  device. 

However  it  has  only  a  single  TSC  and  therefore,  unlike  the  S.C.C.  Without 
Mask,  cannot  check  the  triplicated  input. 

The  Voter-Switch  is  an  important  circuit  with  respect  to  the  fault 
tolerance  capabilities  of  the  design  because  it  provides  the  reconfiguration 
capability.  It  consists  of  a  I  implicated,  voter  and  three -way  switch,  which 
can  be  used  to  connect  triples  of  inputs  to  triples  of  outputs.  In  the  voter, 
each  output  is  the  majority  function  of  (he  three  inputs.  The  choice  of  the 
interconnection  can  be  externally  selected. 

The  last  circuit  m  the  complement  of  these  new  LSI  circuits,  is 
the  clock  controller.  It  is  applied  in  the  RRU  and  is  designed  to  prevent 
inadvertent  clocking  of  information  into  the  Voter-Switch  devices  and  (hereby, 
possibly  changing  the  interconnection.  The  check  signals  are  interlocked  so 
that  either  the  majority  of  the  RRU  computers  control  the  clock  and  or  these 
computers  supply  a  code  word  for  one  of  three  modes  of  operation. 
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2.  Self-Checking  Checker  Without  Mask 


The  Self-Checking  Checker  (S  .C.C. )  Without  Mask  is  a  device  that 
checks  the  outputs  of  the  SDFTP  bit -slice  partitions.  It  is  designed  to  detect 
errors  at  its  inputs  as  well  as  in  the  circuit  itself.  In  addition  to  indicating 
the  presence  of  an  error  at  its  outputs,  it  saves  a  copy  of  the  input  that  is 
currently  being  checked  as  well  as  the  previous  input.  The  device  is  designed 
to  check  three  sets  of  identical  16-bit  inputs  using  self-checking  circuits. 

The  three  sets  of  inputs  are  grouped  into  three  different  pairs  of  1 6 -bit  inputs. 
Each  pair  of  signals  is  encoded  in  a  dual -rail  unordered  code,  which  is  checked 
by  a  self-checking  tree  as  shown  in  Figure  37.  The  two  level  tree  is  composed 
of  four-out -of  eight  (4  8)  self-checking  checkers.  The  single  second  level 
(4  8)  self-checking  checker  has  a  dual-rail  output,  encoded  so  that  the  0-0  and 
1-1  output  combinations  indicate  that  an  error  has  been  detected.  The  three 
trees  are  totally  self-checking  (TSC)  in  the  sense  that  an  assumed  error 
never  causes  an  erroneous  output . 


Figure  37.  Totally  Seif -Checking  Checker  <TSC> 


The  outputs  of  the  three  TSC  trees  are  saved  in  scanning  registers. 
The  three  dual-rail  outputs  are  clocked  into  the  three  scanning  registers  along 
with  the  three  sets  of  16  bit  inputs  as  shown  in  Figure  38.  Each  of  the  register 
clock  lines  are  brought  out  separately  so  that  each  of  the  scanning  registers  can 
be  individually  clocked  as  in  the  case  of  the  SOFT'1 


The  three  scanning  registers  save  the  three  sets  of  16  bit  inputs  and 
the  T.S.C.  outputs;  also  the  three  sets  ol  16-bit  inputs  from  the  previous  clock 
cycle  are  saved.  This  snapshot  is  updated  every  clock  cycle  with  the  current  in¬ 
put  sets  shifted  down  to  the  previous  cycle's  locations.  The  format  of  the  snapshot 
captured  by  the  scanning  register  is  shown  in  Figure  39.  The  three  T.S.C. 

!  outputs  are  located  at  the  output  ends  of  the  registers.  The  error  information 

j  consisting  of  the  outputs  of  the  TSC,  is  rotated  with  respect  to  each  other  so 

that  the  three  T.S.C.  outputs  are  simultaneously  available.  This  makes  it 
]  possible  to  detect  all  errors  detected  by  the  T.S.C.'s  as  is  done  in  the  RRU 

;  S.S.C.  With  Mask. 

1 

The  snapshots  of  (lie  inputs  and  their  T.S.C.  outputs  can  be  read  out 

I  by  supplying  a  clock  signal  to  serially  shift  the  data  out.  Since  the  shift  clocks 

of  the  three  scanning  registers  are  separate,  the  shift  out  of  the  snapshots  can 
be  done  independently  by  supplying  the  appropriate  clock  signals.  The  registers 
are  designed  to  circulate  the  information  as  it  is  read  out  so  that,  if  desired, 
the  snapshot  can  be  retransmitted  by  supplying  additional  shift  pulses. 

The  S.C.C.  Without  Mask  can  accept  up  to  three  sets  of  16  inputs 
and  provides  three  dual-rail  outputs.  It  is  planned  to  be  implemented  in  a 
64 -pin  package  and  has  a  gate  complexity  of  the  order  of  2200  gates.  See 
Figure  40. 
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Figure  39.  Scanning  Register  Format 
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Figure  40.  S.C.C.  Without  Mask  Pin  Usage 

3.  Self -Checking  Checker  (S^C.C.)  With  Mask 

The  Self-Checking  Checker  (S.C.C.)  With  Mask  performs  a  function 
similar  to  the  S.C.  C.  Without  Mask;  that  is,  it  detects  errors  in  its  input  and  in 
the  checker  itself.  Like  the  S.C.C.  Without  Mask  the  design  is  based  on  a 
Totally  Self-Checking  Checker  (T.S.C.)  tree.  However,  since  the  function  of 
the  S.C.C.  With  Mask,  in  the  SDFTP,  is  to  reduce  the  size  of  the  SD  FTP  error 
vector,  consisting  of  all  of  the  S.C.C.  Without  Mask  outputs,  the  input  domain 
consists  of  pairs  of  signals  where  each  bit  is  encoded  in  a  dual  rail  code. 

Thus  a  single  tree  is  sufficient  to  reduce  this  set  of  S.C.C.  Without  Mask 
outputs  to  a  single  dual -rail  output  having  the  same  unordered  code  encoding. 

An  error  is  indicated  by  the  0-0  or  1-1  signals. 


This  T.S.C.  tree  is  a  three -level  tree.  The  first  level  is  composed 
of  4  8  T.S.C. 's  and  this  is  followed  by  a  pair  of  3  6  T.S.C. 's.  The  last  level, 
which  produces  the  output,  is  a(2  4)  T.S.C.  As  in  the  case  of  the  S.C.C. 
Without  Mask  the  output  of  the  T.S.C.  tree  is  saved  in  a  scanning  register  as 
shown  in  Figure  41 . 

The  T.S.C.  is  combined  with  the  two  dual-^ail  encoded  24-bit  inputs 
to  form  the  snapshot.  This  information  is  captured  in  the  scanning  register  by 
using  the  computer  clock  line  to  set  up  t he  signals  in  the  register  flip-flops. 

In  this  device  only  the  current  inputs  are  saved  as  contrasted  with  the  S.C.C. 
Without  Mask  device,  which  saves  both  the  current  input  vectors  and  the 
immediately  preceding  clock -cycle  inputs. 

Since  the  function  of  this  device  is  to  reduce  the  number  of  S.C.C. 
Without  Mask  error  signals,  and  it  must  continue  to  operate  after  one  or  more 
errors  have  been  detected,  those  outputs  that  are  known  to  be  sources  of  errors 
must  be  masked  out .  Hence,  a  mask  register,  which  can  be  externally  loaded 
and  will  block  the  input  signals  that  are  not  to  participate  in  the  over-all  error 
generation  (come  from  devices  that  have  faults),  is  provided. 

The  blocked  signals  are  replaced  by  properly  encoded  signals.  A 
one  in  the  mask  blocks  the  input  signal  while  a  zero  allows  the  input  signal  to 
be  processed  by  the  device  circuits. 

New  masks  are  set  into  the  mask  register  by  serially  shifting  in  the 
mask  bits  two  at  a  time  under  the  control  of  the  mask  clock.  The  operation  of 
the  mask  part  of  the  device  can  be  verified  by  monitoring  the  Mask  Out  out¬ 
put  when  the  shift  clock  is  supplied.  As  new  information  is  read  in.  the 
previous  contents  of  the  shift  register  are  shifted  out. 

Pin  usage  is  shown  in  Figure  42.  Gate  complexity  is  about  800. 

4.  Voter-Switch 

The  Voter-Switch  device  interconnects  the  SDFTP  partitions.  Since 
the  partitions  are  triplicated  the  device  must  accept  triplicated  inputs  and 
produce  triplicated  outputs.  In  the  Voter-Swatch  device  the  triplication  is 
at  the  bit  level.  Each  of  the  triple  outputs  per  bit  is  either  the  majority  func¬ 
tion  of  the  triple  inputs  for  the  bit,  or  it  is  one  of  the  triple  inputs  for  the  partic 
ular  bit  that  has  been  selected  bv  one  of  the  Voter-Switch  switches.  Selection 
of  the  majority  function  or  switch  input  is  controlled  by  the  Voter-OR-Switch 
(VOS)  bit  of  the  Voter-Switch  Control  Register.  The  selection  of  which  of  the 
three  inputs  is  used  is  under  the  control  of  the  switch  control  bits  Cl,  C2 
and  C3,  which  are  also  stored  in  the  Control  Registers.  To  assure  independenc 
of  each  of  the  three  sets  of  circuits, the  Control  Register  is  triplicated  and  is 
used  to  control  only  one  of  the  three  circuits  that  generate  each  bit's  triplicated 
output.  This  is  shown  schematically  in  Figure  43  where  a  single  bit  of  the 
Voter-Switch  circuit ry  is  shown  with  the  three  sets  of  Control  Registers,  which 
are  used  for  all  of  the  nine  bits  of  input.  The  triplicated  outputs  for  each  bit 
are  produced  by  three,  three-level  networks.  The  network  output  is  developed 
bv  an  OR-gate,  which  receives  the  outputs  of  the  gated  voter  and  the  gated 
switch  selected  inputs.  These  inputs  are  gated  bv  the  second  level  of  the  net¬ 
work,  which  is  controlled  by  the  setting  of  the  VOS  bit  of  the  Control  Register. 


IN  PITS 


Figure  42.  S.C.C.  With  Mask  Pin  Usage 

A  "one"  value  for  the  VOS  selects  the  switches,  while  a  "zero"  selects  the 
voter  that  produces  the  majority  function  of  the  inputs.  The  selection  of  the 
particular  input,  when  the  switches  have  been  selected  (the  VOS  is  a  one)  is 
determined  by  the  C  bits  of  the  Control  Register.  Setting  Cl  to  a  one  selects 
i  I  1  by  means  of  the  top,  third  level  AND  gate.  Similarly,  setting  C2  and  C3 
to  ones,  selects  i  1  2  and  i  l  3,  respectively.  Hence,  any  input  can  be  routed 
to  any  one  of  the  three  outputs  for  this  bit  and,  therefore,  can  be  accomplished 
in  the  other  bit  locations  using  their  bit  networks,  which  are  identical.  K.xcluding 
the  Control  Registers,  the  Voter-Switch  consists  of  nine  bits  of  triplicated 
networks  as  shown  in  Figure  43  for  a  total  of  27  selector  slices,  which  are 
identical  except  for  their  inputs. 
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Changes  in  the  interconnection  between  partitions  are  accomplished 
by  changing  the  three  Control  Register  contents.  Each  Control  Register 
controls  one-third  of  the  networks  and  each  can  be  changed  independent l v  ot 
the  others.  In  order  that  these  changes  in  the  interconnections  do  not  affect 
the  information  passing  through  the  Voter-Switch,  the  changes  should  be 
scheduled  during  periods  of  time  when  information  is  not  being  exchanged 
between  partitions. 

Loading  of  the  Control  Registers  is  accomplished  by  seriallv  shifting 
in  the  four  bit  commands  using  the  shift -in  clock.  Since  each  Command  Register 
has  its  own  separate  input  and  clock  line  the  Command  Registers  can  be  in¬ 
dividually  loaded. 

The  Voter-Switch  pin  requirements  are  shown  in  Figure  44.  The 
device  complexity  is  of  the  order  of  400  gates. 

5.  RRU  Clock  Controller 

The  RRU  Clock  Controller  provides  a  comprehensive  control  over 
the  RRU  clock  signals.  It  is  designed  to  prevent  inadvertent  clock  signals 
from  changing  the  SDFTP  control  and  diagnostic  registers,  except  during 
certain  prescribed  intervals.  The  signals  involved  are  the  Self-Checking 
Checker  Shift  Clock,  the  Voter-Switch  Command  Clock,  and  the  Self -Checking 
Checker  With  Mask  Mask  Clock.  Besides  regulating  these  signals  the  device 
provides  additional  drive  for  these  signals. 

Control  is  exercised  over  the  clock  signals  by  inhibiting  each  clock 
signal  in  a  two-input  AND -gate  as  shown  in  Figure  45.  This  inhibiting  signal 
can  be  removed  in  three  different  ways,  corresponding  to  three  different  modes 
of  operation  for  the  clock  controller.  Ail  three  modes  are  vondit ioned  by  a 
code  word  circuit  that  must  receive  the  correct  mode  code  word  before  the  mode 
inhibit  can  be  removed.  On  receipt  of  one  of  these  code  words  and  proper 
decoding,  the  designated  mode  can  be  commanded.  This  circuit  is  shown  in  the 
middle  of  Figure  45.  It  receives  the  code  word  a  nibble  (4  bits)  at  a  tune  and 
shifts  the  32-bit  code  word  into  the  register  under  the  control  of  the  code  word 
clock. 

The  first  clock  mode  is  the  normal  or  unfaiied  RRU  mode.  In  this 
mode,  the  three  RRU  S.C.C.  With  Mask  signals  are  used  to  set  flip-tlops  m 
which  tin'  outputs  are  voted  by  a  majority  gate  as  shown  in  the  top  of  Fig* ire  45. 
The  flip-flops  capture  the  S.C.C.  With  Mask  error  signal,  since  it  w  ill  be  lost 
as  soon  as  the  S.C.C.  snapshot  registers  are  read  out.  The  voter  output,  which 
produces  the  majority  function  of  the  three  RRU  S.C.C.  With  Mask  error  signals, 
then  changes  the  clock  inhibit  signal  to  an  enable  signal.  The  cascade  ot  the  two 
input  AND  and  the  OR  provides  the  code  word  conditioning  for  this  path  <  mode  1 
and  tlie  alternative  methods  of  enabling  the  output  clock  ANI)-gate. 

The  second  mode  is  the  single  RRU  interrupt  signal  which  permits  a 
single  RRU  computer  to  turn  the  clocks  on  after  supplying  the  correct  clock 
word.  As  in  the  first  mode,  the  interrupt  signal  is  used  to  set  a  flip  flop,  which 
actually  drives  the  clock  enabling  circuit  consisting  of  a  two  input  ANI)-gate  and 
a  three  input  OR  (See  Figure  45).  Again,  the  AND  gate  provides  the  command 
word  conditioning  of  this  mode  signal  while  the  OR  gate  is  used  to  merge  the 
alternate  clock  enabling  signals. 
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Figure  44.  Voter  and  Switch  Pm  Requirements 


The  third  test  mode  is  provided  to  permit  the  RRU  to  be  tested.  The 
voter  circuit  at  the  bottom  of  Figure  45  produces  the  majority  function  of  the 
three  RRU  computer  test  signals.  The  voter  output  is  conditioned  by  both  the 
code  word  decoder  output  and  the  main  processor  test  signal  so  that  testing 
can  lie  performed  only  when  both  the  RRU  computer  and  the  bit -slice  processor 
controlled  by  the  RRU,  have  called  for  a  clock  turn-on.  This  conditioning  is 
accomplished  by  the  three -mode  clock  enable  paths. 

After  each  enabling  of  the  clock  signals,  the  circuit  must  be  restored 
to  its  original  inhibited  condition  by  the  reset  signal.  In  the  SDFTP  this 
signal  is  generated  by  an  RRU  computer. 

The  clock  controller  device  controls  eight  clock  signals  as  shown  in 
Figure  4<i.  The  proposed  circuit  is  a  40-pin  package  and  has  complexity  of 
about  400  gat es . 


Fieure  46.  RRU  Clock  Controller  Pm  Assignment 


6.  Microsequencer 


The  proposed  microsequencer  chip  is  intended  for  use  in  the 
SDFTP  microsequencer  partition,  but  it  can  perform  the  microprogram 
sequence  control  for  any  similar  microprogrammed  processor.  In  the  SDFTP 
it  reduces  the  parts  count  and  number  of  devices  significantly  because  of  the 
triplication  of  the  microsequencer  partition.  In  this  partition  it  replaces 
four  hieh-soeed  counter  devices,  two  quadruple  two-to-one  multiplexers,  and 
an  eight -to-one  multiplexer. 

The  microsequencer  extends  the  microprogram  address  field  width 
and  adds  a  microprogram  subroutine  stack.  The  added  capability  is  gained 
without  slowing  the  sequencer  or  adding  excessive  power  dissipation.  The 
commercially  available  microsequencers  generally  offer  more  sophistication 
than  required  for  a  computer  microprogram  sequencer,  with  the  attendant 
costs  in  power,  speed,  and  reliability.  They  suffer  from  the  need  to  address 
the  largest  possible  market. 

The  prosposed  microsequencer  contains  a  microprogram  register 
and  incrementer,  a  two-word  stack  for  microprogram  subroutining,  a  tally 
counter  for  microprogram  loop  control,  and  a  condition  multiplexer  to  select 
jump  tests  from  six  data  dependent  conditions.  A  block  diagram  of  the 
sequencer  is  shown  in  Figure  47.  The  function  of  the  circuit  is  to  generate 
the  next  microprogram  address,  given  the  current  state  of  the  machine. 

The  possible  next  address,  given  that  the  current  address  n,  is  the  next 
consecut ive  address  n  •  1,  a  jump  address  input  from  the  control  store, 
the  address  at  the  top  of  the  subroutine  stack  (return  from  microsubroutine), 
or  the  fixed  address  zero  used  for  honoring  a  program  interrupt.  The  choice 
of  the  next  address  is  controlled  by  the  instruction  input  to  the  sequence, 

Table  III  ,  the  external  condition  inputs,  the  tally  counter,  and  the  address 
input.  The  tally  counter  i:  loaded  from  the  address  input;  it  can  be  decremented, 
and  generates  a  zero  tally  condition. 

The  function  of  the  sequencer  is  as  follows.  The  instruction  decode 
logic.  Table  III  ,  generates  the  control  signals  to  operate  the  remainder  of 
the  sequencer.  In  addition  to  the  instruction  input  (IG  -  1 2 ) ,  the  logic  uses  the 
output  of  the  condition  multiplexer,  Figure  48,  and  the  condition  select  inputs 
(CSG  -  CS2).  The  tally  counter  Figure  49  is  simply  commanded  to  load  from 
the  address  input,  to  decrement,  or  do  nothing,  ft  is  a  12-bit  l's  complement 
counter  that  uses  the  carry  out  as  a  zero  detect.  The  counter  is  synchronous 
with  carry  look-ahead  logic  over  a  group  of  four  bits.  The  heart  of  the  sequencer 
is  the  multiplexer,  microprogram  register,  and  incrementer  logic  shown  in 
Figure  50.  The  multiplexer  chooses,  on  command,  from  the  instruction 
decode  logic,  the  input  to  the  microprogram  register  from  either  the  incrementer, 
the  stack,  the  address  in,  or  the  constant  zero.  At  each  clock  the  microprogram 
register  is  loaded  with  the  output  of  the  multiplexer,  which  becomes  the 
address  output  to  the  control  store.  The  incrementer  adds  "1"  to  the  micro¬ 
program  register  and  makes  the  result  available  to  the  multiplexer  and  the 
stack . 


The  stack.  Figure  51,  is  a  two-word  last  in-first  out  stack.  It  is 
used  to  store  the  return  address,  ml,  when  a  microprogram  subroutine  is 
called.  The  two-word  stack  was  selected  because  the  control  is  part icularlv 
simple  and  the  logic  implementation  is  fast.  Also,  two-level  subroutine 
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TABLE  III.  SEQUENCE  TABLE 
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Figure  48.  Microsequencer  Control 


nesting  is  adequate  for  efficient  microprogramming  of  almost  all  types  of  com¬ 
puter  instruction  sets.  In  fact,  there  is  seldom  a  need  for  more  than  one  level 
of  subroutine.  Larger  stacks  are  often  advocated  when,  the  application  program 
is  part  of  the  microprogram,  which  is  not  of  interest  here. 

The  above  logic  diagrams  specify  the  function  of  the  microsequencer 
chip.  The  approximate  gate  counts  associated  with  each  function  are  as  follows: 

Mux.  Microprogram  Register,  and  Incrementer  -  179 

Tally  Counter  -  144 

Stack  -  180 

Instruction  Decode,  Condition  Mux.  -  35 

These  total  528  gates,  which  is  a  modest -sized  chip,  especially 
when  it  is  considered  that  many  of  the  gates  are  part  of  multiplexers  that 
can  be  realized  by  a  few  transistors  in  some  technologies,  and  when  it  is 
noted  that  most  structures  used  are  quite  regular. 
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IV.  SELF  TEST 


Self -test  is  used  for  a  number  of  different  reasons  in  this  design.  In  the 
bit-slice  processor,  it  is  utilized  in  the  on-line  mode  to  determine  whether  one 
of  two  identical  circuit  implementations  is  faulted  and,  if  so,  which  one.  It  is 
also  used  to  periodically  exercise  the  error  detecting  and  reconfiguration  cir¬ 
cuits  to  verify  that  they  are  still  unfailed.  In  the  RRU,  the  Maskable  S.C  ,C  . 
trees  are  similarly  exercised  to,  initially,  detect  errors;  but  if  an  error  is 
detected  it  is  used  for  locating  the  source  of  the  error.  The  RRU  computers 
are  also  self-tested  to  verify  an  unfailed  status. 

In  the  case  of  error  detection,  the  self-test  verifies  that  the  coverage  is 
in  place  and  operable.  In  the  error  location  mode,  the  self -test  programs  are 
not  only  executed  to  detect  faults  but,  on  completion  of  the  tests,  the  precom¬ 
puted  final  lesult  can  be  compared  with  a  pair  of  results  generated  by  the  SDFTP 
circuits  to  determine  which  circuit  is  in  error.  It  is  this  latter  usage  that  is 
important  for  diagnosing  second  errors  in  the  bit-slice  processor  microsequencer 
and  processor  array  partition,  and  in  the  RRU  computers.  (The  control  store 
second  partition  errors  are  resolved  by  table  lookup  in  the  RRU  as  discussed  in 
Section  III  -  F . ) 

To  obtain  a  more  confident  estimate  of  the  size  of  these  programs  and 
their  effectiveness,  particularly  for  circuit  modules  containing  LSI  devices,  a 
2901  bit-slice  microprocessor  device  was  simulated  using  the  Digitest  Version  - 
4  Logic  Automated  Stimulus  and  Response  (D4LASAR)  facility  and  program 
system . 

The  D4LASAK  program  automatically  generates  high-quality  diagnostic 
tests  for  complex  sequential  and  combinatorial  networks.  When  applied  to  an 
available  gate  level  description  of  the  2901,  the  program  determined  that  3600 
test  vectors  were  required  to  obtain  99'(  fault  coverage. 

Thus,  it  is  likely  that  a  considerable  number  of  vectors  will  be  required 
to  test  circuits  containing  embedded  LSI  devices,  such  as  the  2901,  for  this  level 
of  coverage.  However,  lower  levels  of  fault  coverage  appear  acceptable,  based 
on  the  reliability  predictions  determined  in  Section  VI.  Thus  acceptable  self¬ 
test  should  be  of  manageable  size  if  advantage  is  taken  of  additional  factors. 

One  of  these  is  to  use  the  information  captured  at  the  time  of  the  error  to  localize 
the  error  and  run  those  self-test  segments  that  exercise  those  parts  of  the  circuit 
contributing  to  these  errors. 

The  self -diagnosing  processor  demonstration  described  in  the  Program 
Plan  (Section  VII),  offers  an  opportunity  to  better  quantify  the  siz.e  and  coverage 
of  these  self-test  programs  . 


V. 


RELIABILITY  ENHANCEMENT  AND  PREDICTION 


A.  INTRODUCTION 

The  reliability  enhancement  of  the  SDFTP,  with  respect  to  the  sim¬ 
plex  processor,  is  achieved  via  static  and  dynamic  redundancy.  The  bit-slice 
processor  is  partitioned  to  reduce  the  size  of  the  circuit  modules  that  are 
switched.  This  was  accomplished  without  incurring  undue  interface  switching 
costs.  Hence,  partition  size  is  a  compromise  between  the  reliability  improve¬ 
ment  due  to  smaller  partitions,  which  approach  "component"  reliability,  and 
the  loss  of  reliability  due  to  additional  interface  devices,  which  are  needed  to 
interconnect  the  partitions  and  provide  error  detection  and  location  information 
The  resulting  three  partitions  have,  roughly,  equal  failure  rates  as  shown  in 
Table  IV  with  only  four  interfaces  that  must  be  monitored  and  controlled,  as 
shown  in  Figure  3. 


TABLE  IV 

PROCESSOR  PARTITION  FAILURE  RATES 


MICROSEQUENCER 

5 .  GG 

CONTROL,  STORE 

7.75 

PROCESSOR  ARRAY 

7.73 

Self-diagnosis  is  achieved  through  the  use  of  self -checking  checkers 
that  extend  the  error  protection  boundary  to  include  not  only  the  partitions  but 
the  checkers  themselves.  These  checkers  are  colocated  at  each  interface  with 
the  devices  that  provide  the  interconnection  between  the  partitions.  These 
checkers  monitor  the  partition  outputs  and  alert  the  Reconfiguration  and  Recovery 
Unit  of  the  occurrence  of  an  error  as  soon  as  one  occurs.  These  signals  limit 
error  propagation  to  the  partition  in  which  the  error  occurs  by  inhibiting  the 
clock.  Thus  the  bit-slice  processor  error  state  is  maintained  until  the  RRU 
computer  can  use  the  checker  snapshots  to  locate  the  partition  interface  that  is 
reporting  an  error. 

After  the  error  detection  and  location  functions  have  been  accomplished, 
the  RRU  reconfigures  the  bit -slice  processor  using  the  checker  error  informa¬ 
tion  and  the  SDFTP  status  that  it  maintains.  The  RRU  achieves  the  reconfiguration 
by  emitting  reconfiguration  commands  to  the  Voter-Switches  located  at  each  inter¬ 
face.  These  commands  art1  intended  to  selec  either  the  voter  or  the  switch  con¬ 
nection  between  adjacent  partitions. 

At  the  start  of  each  mission,  the  processor  is  configured  with  the 
voters  providing  the  connection  between  the  partition.  After  two  errors  are  de¬ 
tected  at  .in  interface,  the  RRU  commands  the  Voter-Switch  to  change  over  to  the 
switch  connection.  Once  reconfiguration  has  been  completed,  the  RRU  initiates 
the  recoverv  process  in  the  SDFTP  in  one  of  two  wavs:  either  the  RRU  vectors 
the  bit-slice  processor  back  to  the  last  roll  back  point  or  it  vectors  the  processor 
to  the  self-test  routine',  which  falls  through  to  the  rollback  point  m  the  application 
program  if  no  errors  are  detected  during  the  self-test  exercise. 
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The  calculation  ol'  tin-  reliability  of  the  SDFTP  depends  on  the  model  ol 
the  design  developed  and  the  failure  rates  ol  the  devices  employed  in  the  design. 
The  model  of  die  SDFTP  design  is  described  in  the  next  unit.  The  discussion  ol 
the  reliability  estimate  is  concluded  with  a  discussion  of  tile  use  of  the  M1L-217H 
Handbook  method  used  to  calculate  the  failure  rates. 

B.  SDFTP  RELIABILITY  MODEL 

The  SDFTP  reliability  model  consists  of  the  serial  reliability  of  the 
bit -slice  partitions  and  the  RRL'.  Each  partition's  reliability  module  is  modeled 
as  the  cascade  of  the  following  devices. 

1 )  an  input  S  ,C  .0  . 

2)  bit-slice  processor  partition 

3 )  an  output  S  .C  .C  . 

4)  Voter -Switch . 


Since  the  bit-slice  partitions  are  triplicated,  the  partition  reliability  expression 
is  of  the  following  form: 

Partition  Reliability  -  R3  -  3R2  (1-R>  +  3R  (1-R>2  (1) 

where  R  is  the  reliability  of  a  single  partition. 


The  reliability  of  the  input  and  output  S.C.C.  and  the  Voter-Switch 
voter,  switch,  combiner  and  command  register  are  of  a  similar  form.  All  but 
tlie  voter  and  switch  are  considered  to  be  in  series  in  the  reliability  model. 
Because  of  the  switching  froi  i  the  voter  connection  to  the  switch  after  the  second 
error  detection,  the  voter  reliability  modifies  only  the  first  two  terms  of  the 
partition  reliability  expression.  Eq .  1,  while  Uie  switch  modifies  the  last  term 
ol  the  expression.  Hence,  the  reliability  equation  for  the  partition  module  is: 

f  R  3  •  3  R  2< 1 -R  )  ■  3  R  (  1-R  > 

^  V  V  V  V  V 

R  3  •  3  R  2  (  1  ■  R  >  •  3R  (1-R  r 
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where  the  reliability  lor  the  devices  is 
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-  partition 

-  voter  ol  the  Voter-Switch  device 

-  switch  of  the  Voter-Switch  device 

-  input  S  .C  .C  . 

-  output  S  .C  .C  . 

-  combiner  of  the  Voter -Switch 

-  command  register  of  the  Voter  -Switch . 


The  nhcrosequencer  and  control  store -pipeline  register  follow  tins 
form  exactly.  However,  the  processor  array  requires  that  Eq.  2  be  modified 
to  account  for  the  fact  that  it  has  three  interfaces  --  one  input  and  two  output 
The  modification  consists  of  multiplying  Eq.  2  by  the  appropriate  third  interface 
devices.  Since  the  processor  array -microsequencer  interface  is  identical  to 
that  modeled  in  Eq.  2,  the  processor-array -memory  interface  will  be  considered 
as  the  added  interface.  Thus,  the  first  and  second  terms  of  Eq.  1  are  now  modi¬ 
fied  by  two  voter  expressions.  One  is  the  processor  array -microsequencer 
interface  Voter -Switch  device  and  the  other  is  the  processor  array-memory 
interface  device.  The  third  term  of  Eq.  1  is  modified  by  the  two  switch  reliabil¬ 
ities  of  the  two  processor  array  output  interfaces.  Thus  this  part  of  the  expres¬ 
sion  becomes, 


R  .  A  .  Pa  r  1 1  ♦  ion 
with  voter 
and  switch 
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R  processor  arrav  partition  reliability 

R  -  processor  a rrav-microsequencer  Voter-Switch  voter 
vm 

R  -  processor  array-memory  Voter-Switch  voter 
vo 

R  -  processor -array-nucrosequencer  Voter-Switch  switch 

R  -  processor  array-memory  Voter  Switch  switch 

so 


Equation  3  is  multiplied  by  the  reliability  of  the  input  S.C.C.,  the 
processor  -a  rr.  i  v  mic  resequence  r  interface  S  .C  .C  .  .  and  the  processor  -  arrav- 
memorv  interface  S.C.t  '..  the  \  oter -Switch  combiner  and  the  command  register 
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reliabilities.  Each  of  these  circuits  is  triplicated  and  therefore  modifies  Eq.  2 
by  a  factor  of  the  iorm  of  Eq.  1.  Eq.  4  is  the  processor  arrav  reliability 
expression: 


P.A.  Partition 
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HCMO  ■  3RCMOa'RCMO)  ■  3  HCMO(  ^CMo’ 


where 


Rp’Rvm’Rvo,Rsm'  ancl  arc  defined  as  before  and 


so 


RI(^,  -  input  S.C.C.  reliability 

Rqcm  -  output  S.C.C.  processor  array-nncrosequencer  interface' 

RqCo  '  oulPtd  S.C.C.  processor  array  memorv  interface 

^CM  "  processor  array-niicrosequencer  Voter-Switch  combiner 

RC'0  ”  Proci‘ssor  array-memory  Voter-Switch  combiner 

RCMM  “  processor  -arrav  -  microsequenccr  Voter  Switch  command 
register 


RCMO  '  processor  arrav  niemor 


Voter  Switch  command  register. 
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7'  processor  array  and  the  microsequencer  partitions  must  be 
modified  t.<  'i .  ut  for  the  use  of  self -test  to  diagnosis  which  of  two  partitions 
has  faile  ai  tv  o  have  been  detected  that  affect  the  same  partition.  Since  the 
coverage  of  u:<_  ,-:i  seif-test  programs  is  not  complete,  the  term  that  represents 
the  condition  of  two  partitions  failed  and  one  unfailed  is  modified  by  a  factor  that 
accounts  for  this  incomplete  detection  capability.  This  coverage  factor  is  the 
conditional  probability  that,  given  that  an  error  has  occurred,  the  error  is 
detected.  For  the  microsequencer  this  term  of  Eq.  2  becomes, 

3  R  <  1-R  )2  R  3  +  3  R  2(1-R  )  +  3R  (1-R  >2)cF  ,  (5) 

p  p  \  s  S  s  s  s  /  s  ’ 

where  the  terms  are  as  defined  for  Eq.  2  and 


CFg  is  the  microsequencer  self-test  coverage  probability. 

For  the  processor  array  module,  the  corresponding  term  from  Eq.  4  is, 

3R  (1-R  )2  (R  3  +  3  R  2  (1-R  )  +  3  R  (1-R  >2 

p  P  \  sm  sm  sm  sm  sm 

x  (R  3  +  3R  2  (1-R  )  +  3  R  (1-R  )2)  CF... 

so  so  so  so  so  /  PA 


(6) 


where  the  terms  are  defined  as  in  Eq.  4  and  CFp^  is  the  processor -array  self- 
test  coverage  probability . 


The  reliability  of  the  three  partitions  of  the  SDFTP  is  then  the  product 
of  each  of  the  partition  modules  reliability  as  given  in  Eq.  7, 


Bit  Slice 

Processor 

Reliability 


R  x  R  x  R 


m 


cs 


PA 


where 


Rm  -  microsequencer  partition  module  reliability 

R  -  control  store  partition  module  reliability 
c  s 

Rp^  -  processor  array  partition  module  reliability 


(7) 


The  overall  reliability  of  the  SDFTP  is  the  serial  reliability  of  the  bit- 
slice  processor  partitions  and  the  RRU,  which  includes  the  Maskable  S.C.C., 
and  the  RRU  computer,  together  with  the  clock  controller.  Since  there  are  three 
strings  in  the  RRU,  the  reliability  expression  for  the  RRU  is: 

RRU  Reliability  =  (rt  •  RRRC  *  Rcc)  +  3  (RT‘  rrrc  *  RCC^  (1_RTRRRCRCC 

+  3rtrRRcrcc  (1_rtrrrc  ’  rcc)  <8) 
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where  RT  is  the  reliability  of  the  Maskable  S  .C  .C  .  tree 

RppC  *s  reliability  of  the  RRU  computer  including  its 
memory/l/O  buffers 

Rj-,j-,  is  the  reliability  of  the  clock  controller . 

The  SDFTP  reliability  is  then, 


Reliability 

SDFTP 


Rm  X  RCS  *  RPA  X  RRRU  ’ 


(9) 


where  RRR(J  is  the  reliability  of  the  RRU. 

Once  the  reliability  model  has  been  established,  the  reliability  of  the 
design  can  be  calculated  using  the  device  failure  rates. 

C  .  FAILURE  RATE  CALCULATIONS 

The  reliability  assessment  performed  on  this  program  follows  that 
given  in  Military  Standardization  Handbook  MIL-HDBK-217B,  Reliability  Pre-, 
diction  of  Electronic  Equipment.  Device  failure  rates  were  determined  using 
the  expressions  given  there  for  monolithic  solid  state  integrated  circuit  devices 
as, 


*  =  ^  <C,  +  Cn  ) 


L  Q  1  T 


2  E 


where  -  the  device  failure  rate  in  failures/10o  hours 

-  is  the  device  learning  factor 

For  this  projection,  was  assumed  to  be  1 
for  all  devices  (in  production) 

is  the  quality  factor 

ttq  was  assumed  to  be  2  corresponding  to  quality 
level  B,  MIL-M-38510,  Class  B  (JAN) 

n  j  -  is  the  temperature  acceleration  factor 

An  average  junction  temperature  of  75°C  resulting 
in  ttt  of  1.6. 

77g  -  is  the  application  environment  factor 

This  was  selected  as  6.0  corresponding  to  an 
airborne,  uninhabited  environment. 


(10) 


t 
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The  complexity  factors  for  the  SSI,  MSI  logic  and  ROM  memory  were 
determined  using  the  gate  counts  and  tables  given  in  the  Handbook.  For  the 
LSI  devices,  where  gate  estimates  were  not  available,  manufacturer  informa¬ 
tion  was  used  where  available  and,  where  not  available,  a  gate  count  was 
estimated  based  on  the  logic  equivalents.  Complexity  factor  projections  for 
Ci  and  C2  were  developed  using  the  Handbook  per  gate  values  for  the  optimum 
level  of  integration.  This  had  the  effect  of  making  the  per  gate  complexity 
factors  less  sensitive  to  gate  count  for  the  device  complexities  of  interest  to 
this  program,  and  reduced  the  effect  of  the  gate  count  uncertainty  of  the  highly- 
integrated  LSI  devices,  such  as  the  8048  computer. 

D.  RELIABILITY  ESTIMATES 

Using  the  device  failure  rates  calculated  as  described  above,  the 
reliability  of  the  simplex  processor  consisting  of  a  microsequencer,  control 
store  and  processor  array,  was  calculated.  The  corresponding  probability 
of  failure  for  one  to  10  hour  missions  were  calculated  and  are  shown  in 
Figure  52 .  The  SDFTP  probability  of  failure  for  various  coverage  factors 
is  plotted  in  Figure  53 . 
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MISSION  DURATION  (HOURS) 


Figure  52.  Simplex  Processor  Probability  of  Failure 
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PROBABILITY  OF  FAILURE 


1  x  10- 


MISSION  DURATION  (HOURS) 


Figure  53.  SDFTP  Probability  of  Failure 


VI.  COMPARISON  OF  SDFTP  AND  SIMPLEX  PROCESSORS 


The  Self -Diagnosing  Fault-Tolerant  Processor  (SDFTP )  provides  "failed 
op^"  fault  tolerance.  In  comparison  with  the  simplex  design,  which  cannot 
tolerate  a  fault,  it  is  significantly  superior  --  especially  for  fault  tolerant 
applications  such  as  flight  control.  Foi  short  missions,  less  than  10  hours, 
it  has  significantly  enhanced  reliability  compared  with  the  simplex  design. 

Its  failure  rate  for  a  two  hours  mission  is  nearly  four  orders  of  magnitude 
less  than  the  simplex  design,  as  described  in  Section  V. 

Testability  and  maintainability  of  the  SDFTP  is  significantly  improved 
over  the  simplex  design  since  it  maintains  an  error  history  and  up-to-date 
status  of  the  SDFTP  during  the  entire  flight,  which  can  be  utilized  to  decrease 
repair  times.  The  incorporation  of  self -test  programs,  coupled  with  self¬ 
checking  checkers  and  partition  interface  scanning  registers,  significantly  in¬ 
creases  the  diagnosis  capability  since  the  error  reporting  is  to  a  much  finer 
scale  than  it  is  in  the  simplex  design. 

In  the  performance  and  ease  of  application  the  two  designs  are  com¬ 
parable.  The  rate  of  instruction  execution  of  the  SDFTP  should  approach  that 
of  the  simplex  design,  provided  that  the  processor  control  circuitry  does  not 
entail  large  delays.  Since  the  instruction  repertoires  of  the  two  designs  are 
nearly  the  same,  the  ease  of  programming  should  be  nearly  the  same.  It  is 
intended  that  the  SDFTP  have  some  additional  instructions  to  ease  the  recon¬ 
figuration  and  recovery  process . 

Since  the  improved  fault  tolerance  and  reliability  of  the  SDFTP  is  achieved 
via  redundancy,  the  SDFTP  requires  much  larger  resources  than  the  simplex 
processor.  As  listed  in  Table  V,  the  simplex  processor  only  requires  45 
devices,  (without  the  microsequencer  device)  while  the  SDFTP  requires  about 
4.4  times  as  many  devices.  Most  of  the  additional  parts  are  required  to  imple¬ 
ment  the  triplication  employed  with  the  bit-slice  processors  and  the  RRU  error 
processing  channels.  The  remainder  are  needed  to  implement  the  checkers, 
partition  interconnection  devices  and  the  dynamic  redundancy  control.  In  terms 
of  the  number  of  different  devices  (parts  count)  the  simplex  processor  requires 
just  12  different  devices,  all  of  which  are  commercially  available.  In  contrast, 
the  SDFTP  is  implemented  with  19  different  devices,  of  which  four  are  special 
designs,  as  shown  in  Table  VI.  Both  the  simplex  and  the  SDFTP  would  benefit 
from  the  special  microsequencer  device  design,  since  it  would  reduce  the  sim¬ 
plex  parts  count  by  six  and  the  SDFTP  count  by  18.  The  microsequencer  device 
would  also  eliminate  three  different  parts  thereby  reducing  the  parts  types  by 
two.  The  SDFTP  is  much  more  extensively  integrated  than  the  simplex,  with 
nearly  half  of  the  devices  of  the  LSI  level  of  integration. 
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TABLE  V 


COMPONENT  COMPARISON 
OF 

SIMPLEX  VERSUS  SDFTP 


FUNCTION 

SIMPLEX 

SDFTP 

BIT -SLICE 
MICROPROCESSOR 

LSI  DEVICES 

16 

3(16) 

MSI  DEVICES 

26 

3(26) 

SSI  DEVICES 

3 

3(3) 

TOTAL 

45 

135 

CHECKERS 

S.C.C. 

(LSI) 

21 

RECONFIGURATION 

VOTER -SWITCH 
(LSI) 

10 

RRU 

MONOLITHIC 

MICROCOMPUTER 

(LSI) 

27 

CLOCK 

CONTROLLER 

3 

MISC. 

12 

OVER -ALL 

TOTAL 

208 
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TAi.\LE  VI 
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PARTS  COUNT  COMPARISON 
SIMPLEX  VERSUS  SDFTP 


SIMPLEX  SDFTP 


LS00 

LS00 

LS04 

LS04 

LS86 

LS86 

LS138 

LS138 

LS151 

LS151 

LSI  58 

LS158 

LS163 

LS163 

LS174 

LS174 

LS253 

LS253 

2902 

2902 

2901 

2901 

5341 

5341 

S.C.S.  WITHOUT  MASK 
S.C.C.  WITH  MASK 
VOTER -SWITCH 
C  LOCK  -CONTROLLER 
8084 
8243 
8355 


Storage  requirements  for  the  two  designs  are  difficult  to  compare.  The 
simplex  processor  has  nothing  equivalent  to  the  RRU  computer-storage  devices 
In  the  bit-slice  processor  area,  the  requirements  are  hard  to  quantify  for  the 
reasons  cited  in  Section  IV.  However,  it  is  believed  that,  because  the  SDFTP 
is  partitioned  into  smaller  circuit  modules  than  the  simplex  processor,  the 
SDFTP  can  use  smaller  diagnostic  programs  than  the  simplex  processor,  with 
higher  coverage . 

Another  advantage  of  the  SDFTP  is  the  absence  of  a  hard-core  problem 
since  two  replicas  should  be  operable  under  the  single  fault-at-a-time  assump¬ 
tion  . 
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VII.  PROGRAM  PLAN 


It  is  recommended  that  a  prototype  self-diagnosing  fault  tolerant  proces¬ 
sor  be  built  to  demonstrate  the  concepts  and  techniques  involved,  and  to  indicate 
the  resources  required  for  their  implementation.  This  demonstration  will  indi¬ 
cate  whether  the  self -diagnosing  processor  has  the  ability  to  operate  correctly 
through  the  introduction  of  faults  in  the  processor,  and  whether  it  can  execute 
the  prescribed  tasks  in  a  timely  fashion.  These  experiments  will  be  designed 
to  demonstrate  the  following  qualities: 

•  degree  of  tolerance 

•  comprehensiveness  of  protection 

•  responsiveness  of  error  processing 

•  types  of  fault  coverage 

•  compatibility  with  LSI  implementation 

The  ability  to  protect  the  processor  from  both  single  and  multiple  errors,  single 
and  double  fault  occurrences  of  the  same  fault  without  intervening  repair,  and 
consistent  and  inconsistent  types  of  errors  will  be  demonstrated.  The  ease  and 
variety  of  fault  insertion  are  important  attributes  because  they  allow  the  effective¬ 
ness  of  the  error  detection,  error  location,  reconfiguration,  and  recovery  capa¬ 
bilities  of  the  processor  to  be  readily  exhibited.  Thus  the  ability  to  display  the 
state  and  readiness  of  the  demonstrator,  as  well  as  the  fault  history,  are  im¬ 
portant  considerations  in  developing  an  effective  demonstrator  presentation. 

A.  DEMONSTRATOR  DEVELOPMENT  PLAN 

The  recommended  approach  consists  of  a  two -phase  development  program. 
The  objective  of  this  program  is  to  construct  a  demonstrator  together  with  its 
associated  demonstration  programs,  which  will  show  the  operation  of  the  self- 
diagnosing  processor  under  various  conditions  of  fault.  The  first  phase  will  be 
concerned  with  definition  and  design  specifications,  and  the  experiments  that  can 
be  run  on  the  demonstrator.  The  resulting  definition  will  then  be  used  to  esta¬ 
blish  the  requirements  and  specifications  of  the  demonstrator.  The  self -diagnosing 
processor  will  be  built  during  the  second  phase,  according  to  the  specifications 
arrived  at  in  Phase  I;  the  result  will  be  a  self -diagnosing  processor  and  its  asso¬ 
ciated  computer  programs  to  demonstrate  the  capabilities  of  the  processor. 

1 .  Definition,  Design  and  Specification  Phase 

The  following  tasks  shall  be  accomplished  during  this  phase: 

Define  Demonstration  System 

A  plan  for  demonstrating  the  capabilities  of  the  Seif -Diagnosing  Fault 
Tolerant  Processor  (SDFTP),  including  the  experiments  that  are  to  be  performed 
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to  illustrate  the  features  and  range  and  extent  of  the  fault  tolerance  desired  by 
the  Air  Force,  will  be  developed.  A  functional  description  of  the  Seif -Diagnosing 
Fault  Tolerant  Demonstration  System  will  be  provided. 

Design 

Using  the  Air  Force  approved  SDFTD  plan  the  functional  specifications 
of  the  demonstrator,  including  processor  design  demonstration  requirements  and 
principal  interfaces,  will  be  established.  This  task  will  have  two  distinct  parts, 
preliminary  design  and  detailed  design  . 

In  the  preliminary  design  work  the  Seif -Diagnosing  Design  Techniques 
Demonstrator  (SDFTD),  including  the  self -diagnosing  fault  tolerant  processor 
(SDFTP).  will  be  designed  to  a  level  that  clearly  shows  the  technical  adequacy 
of  the  selected  approach  and  establishes  the  ability  of  the  SDFTD  to  demonstrate 
the  fault  tolerance  of  the  processor. 

This  preliminary  design  effort  will  result  in  a  SDFTP  hardware  devel¬ 
opment  specification  that  covers  (1)  all  essential  system  functional  character¬ 
istics,  (2)  necessary  interface  characteristics,  f 3 )  specific  designation  of  the 
functional  characteristics  to  key  configuration  items,  and  1 4 )  tests  that  will 
verify  that  the  specified  performance  has  been  achieved.  A  Computer  Program 
Development  (built  to)  Specification  will  also  be  written  that  describes  in  opera¬ 
tional,  functional,  and  mathematical  language,  all  of  the  requirements  necessary 
to  design  the  required  computer  programs  in  terms  of  performance  criteria. 

in  the  detailed  design  portion  the  SDFTD  will  be  designed  to  a  level 
that  clearly  shows  that  all  design  requirements  are  satisfied,  that  the  design  is 
essentially  complete,  and  that  the  fabrication  drawings  are  ready  for  release. 
This  work  will  culminate  with  a  presentation  of  the  detailed  design  to  the  Air 
Force  for  approval.  The  SDFTD  detailed  hardware  requirements  will  also  be 
developed  during  this  part  of  the  effort.  These  requirements  shall  be  specified 
in  a  hardware  functional  description  that  establishes  the  performance,  design 
and  fabrication  requirements.  Design  drawings  will  be  provided  to  good  com¬ 
mercial  practice.  The  SDFTP  computer  programs  will  be  described  in  a 
Computer  Program  Product  Specification  that  provides  a  summary  of  the  pur¬ 
pose  and  scope  of  the  specification  and  a  review  of  the  major  functions.  The 
requirements  section  will  provide  for  a  functional  allocation  description,  func¬ 
tional  description,  storage  allocation,  functional  flow  diagram,  program  in¬ 
terrupts,  and  control  logic  description. 

Program  Plan 

The  Phase  I  report  will  include  a  plan  for  the  implementation  and 
demonstration  of  the  seif -diagnosing  processor  designed  during  Phase  I.  This 
plan  will  include  a  description  and  schedule  of  the  major  events  in  hardware 
and  software  construction,  test  and  demonstration.  Estimates  of  material  cost, 
labor  by  type,  and  schedule  will  be  included. 
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2  .  Fabrication,  Test  and  Demonstration  Check-Out  Phase 


Working  from  the  detailed  design  information  developed  in  Phase  I. 
the  demonstrator  will  be  built  and  the  associated  computer  programs  will  be 
written.  The  custom  LSI  devices  that  were  designed  during  this  development 
will  be  implemented  in  small-scale  integrated,  SSI,  and  medium-scale  inte¬ 
grated.  MSI,  circuit  form.  These  devices,  as  well  as  those  needed  to  imple¬ 
ment  the  remainder  of  the  design,  will  be  selected  from  commercially  available 
products.  Layout  and  fabrication  of  the  custom  LSI  devices  is  planned  following 
successful  operation  of  the  demonstrator. 

Checkout  of  the  hardware  and  the  software  are  scheduled  to  proceed 
concurrently,  using  available  development  systems.  Integration  of  the  hardware 
and  tli e  software  will  be  accomplished  as  the  individual  subsystems,  programs 
and  routines  are  tested  and  checked  out.  The  completely  integrated  demonstrator 
will  be  tested  to  verify  that  the  demonstrator  performs  as  specified  and  tire  fault 
insertion  experiments  can  be  successfully  performed. 

The  proposed  schedule  for  the  development  of  the  demonstrator  is 
shown  in  Figure  54.  Significant  milestones  are  also  indicated.  The  definitive 
plan  and  schedule  for  this  second  program  phase  will  be  refined  at  the  end  of 
the  Definition,  Design  and  Specification  phase  as  indicated  in  the  Program  Plan, 
above . 
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A. 


INTRODUCTION  AND  SUMMARY 


Two  airborne  application  areas  were  selected  for  the  baseline  require¬ 
ments.  These  are  airborn  flight  control  and  the  synthetic  aperture  ground 
may  function  of  airborne  multimode  radar  signal  processing.  Each  application 
is  examined  to  determine  its  processor  requirements,  beginning  with  mission 
identification  and  functional  analysis  leading  to  the  development  of  the 
algorithm  flow.  Performance  analysis  of  representative  tasks  is  described 
and  the  resource  estimates  are  developed  in  terms  of  memory,  processor 
speed,  and  complexity,  as  measured  in  terms  of  the  variety  of  operations  and 
their  corresponding  execution  rates. 

The  Flight  Control  Application  is  considered  first  in  Section  B  since 
it  represents  a  set  of  requirements  that  falls  within  the  realization  capa¬ 
bilities  of  existing  LSI  devices,  such  as  microprocessors  and  memories,  con¬ 
figured  in  a  single  programmable  computer  structure.  It  is  estimated  that  a 
high  performance  control -configured,  fly -by -wire  aircraft  would  require  less 
than  16,000  words  of  16-bit  wide  memory  and  could  be  controlled  by  a 
processor  capable  of  executing  instructions  at  a  300  to  400  thousands  of 
operations  per  second  (KOPS)  rate.  Because  of  the  safety  requirements  of 
this  application,  quadruple  redundancy  coupled  with  software  implemented 
redundancy  management  leads  to  a  sophisticated  input  output  system  that 
connects  the  electronic  flight  control  system  to  the  aircraft  control  sensors 
and  actuators.  The  reconfiguration  approach  is  designed  to  achieve  "failed 
op-squared"  fault  tolerance  for  the  electronics. 

The  second  application,  Synthetic  Aperture  Ground  Map  Processing  of  a 
Multimode  Radar,  described  in  Section  C  results  in  signal  processor  require¬ 
ments  that  are  beyond  the  capability  of  current  and  near  future  single  con¬ 
ventional  microprocessor  designs.  However,  special  progi'ammable  pipeline 
processors  and  netted  sets  of  microprocessors  are  believed  to  be  capable  of 
achieving  the  performance  required.  As  in  many  of  the  other  radar  signal 
processing  applications,  the  core  signal  processing  function  has  the  ability 
to  generate  a  doppler  frequency  analysis  of  the  radar  return.  For  this  ground 
mapping  mode  of  the  multimode  radar,  a  processing  rate  in  excess  of 
20  x  10°  complex  multiplies  is  required  in  addition  to  a  signal  processing 
operation  rate  in  the  1-2  million  instructions  per  second  range.  Compared 
to  the  flight  control  application,  the  multimode  memory  requirements  are 
significantly  larger  and  are  estimated  to  fall  in  the  3.5  million  bit  range. 

This  storage  is  normally  distributed  throughout  the  signal  processor  and  must 
provide  a  high  memory  accessing  rate  capability,  which  is  a  function  of  the 
specific  radar  mode  and  signal  processor  architecture. 


B.  AIRCRAFT  FLIGHT  CONTROL 


This  electronic  flight  control  system  combines  contemporary  ideas  for 
reconfiguration  (transient  fault  recovery,  computer  self-monitoring)  with 
conventional  hardware  redundancy  techiques  in  a  basic  quadruplex  redundant 
structure.  With  appropriate  operating  software,  the  system  provides  the 
reliability  and  fault  tolerance,  which  are  typically  characterized  as  "failed 
op-squared"  performance.  In  addition,  the  system  automatically  recovers 
from  certain  transient  faults,  such  as  interruption  of  electrical  power,  and 
reorders  itself  to  obtain  the  highest  available  level  of  redundant  operation. 

A  high  performance  control -configured  vehicle  (CCV),  fly-by -wire  (FBW)  air¬ 
craft  and  control  surfaces  are  shown  in  Figure  1-1.  Table  1-1  is  a  summary 
of  the  major  aircraft  functions  with  respect  to  aircraft  safety  criticality  In 
the  following,  we  will  be  primarily  Interested  in  the  flight  crucial  functions 
since  they  are  performed  in  the  flight  control  system. 


Figure  1-1.  Flight  Control  Electronic  System  Control  Surfaces 


Transformation  of  the  operational  criteria  into  design  requirements 
for  a  fault  tolerant  (redundant)  digital  computer  system  is  arranged  to 
obviate  single  point  system  failures.  The  design  must  also  meet  the  follow¬ 
ing  fundamental  design  requirements: 

Each  computer  unit  shall  independently  assess  its,  and  the 
system’s, operational  status. 

.  No  computer  or  combination  of  computers  shall  interrupt 
another  computer’s  normal  operation. 
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The  system's  redundant  operation  must  start  up,  and  recover 
from  transient  fault  conditions  without  flight  crew  intervention. 

.  The  system  design  must  be  able  to  achieve  functional 
operation  down  to  a  simplex  string  of  operable  elements 
and  be  architecturally  expandable  to  at  least  quadruplex 
redundancy . 

TABLE  I  -  1 .  FUNCTION  SUMMARY  FOR  APPLICATION  MODELS 


|  Flight  Crucial  Functions 

- - - - — — ^ 

.  Flutter  Suppression 

♦ 

Structural  Mode  Suppression 

.  Fly-bv-Wire  Control  ; 

J 

Full -tine  Stability  Augmentation  ! 

|  Flight  Critical  Functions 

.  Category  III  MLS  Autoland  j 

Noncritical  Functions 

.  Track  Angle  Select  Hold 

.  Flight  Path  Angle  Select  Hold 

.  2D  '3D  4D  Command  Generation 

.  Air -Ground  Data  Link  for  ATC 
Communication 

(Above  functions  provided  by 

Navy  'Guidance  Computers) 

Proceeding  from  these  requirements,  a  software  impleme  ,ted  redundant 
management  approach  leads  to  the  inclusion  of  a  reconfiguration  process  con¬ 
sisting  of  failure  isolation,  transient  fault  recovery,  and  redundancy  de  - 
gradation.  The  redundant  channel  processes  are  consolidated  at  two  system 
nodes:  at  the  sensor  signal  input  to  the  control  law  computations  and  at  the 
servo  actuator  output.  The  sensor  signal  selection  process  is  mechanized  in 
software  and  the  output  voting  node  is  a  hydromechanical  mechanization.  How¬ 
ever,  the  majority  of  the  reconfiguration  mechanisms  are  software  processes 
designed  to  achieve  system  flexib’lity  and  adaptability.  The  hardware  archi¬ 
tecture,  by  virtue  of  its  communication  interconnections,  is  what  makes  it 
practically  possible  to  achieve  the  benefits  of  reconfiguration. 

The  computer  unit  is  replicated  on  a  per  channel  basis  to  build  a  re¬ 
dundant  (in  this  case,  quadruplex)  fault  tolerant  system.  A  processor  and 
all  channel  interface  electronics  are  included  in  the  computer  unit.  Sensor, 
mode  control  and  servo  hardware  interfaces  are  dedicated  on  a  channel  basis. 
All  cross-channel  communication  is  accomplished  via  dedicated  one-way 
serial  digital  data  buses  that  independently  interconnect  each  computer  to  each 
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other,  providing  complete  electrical  isolation  between  channels.  Each  computer 
exclusively  controls  the  engagement  and  shutdown  of  its  own  servos.  A  block 
diagram  of  an  integrated  navigation  /guidance  /flight  control  system  is  shown  in 
Figure  1-2.  The  assignment  of  channels  and  input /output  electronics  to  the 
computer  units  is  shown  in  Figure  1-3.  A  more  detailed  view  of  a  single 
computer's  sensor  and  actuator  relationship  is  shown  in  Figure  1-4. 

The  control  surface  functions  are  pictured  in  Figures  A-l  through  A -9 
in  Addendum  A.  They  are: 

.  Stabilator  Functions 
.  Trailing  Edge  Flap  (TEF)  Functions 
.  Leading  Edge  Flap  Functions 
.  Rudder  Functions  (Channels  1  and  2) 


t 


Figure  1-2.  Integrated  Navigation/Guidance/Flight  Control  System 


I, 
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Figure  1-3.  Flight  Control  Electronics  Set 


FLIGHT  CONTROL  COMPUTER 


.  Rudder  Functions  (Channels  3  and  4) 

.  Aileron  Functions  (Channels  1  and  2) 

.  Aileron  Functions  (Channels  3  and  4) 

.  Nose  Wheel  Steering  Functions 

.  Approach  Power  Control  Functions 

The  interfaces  between  the  computer  software  programs  and  hardware  and 
the  rest  of  the  flight  control  system  are  shown  in  Figure  1-5. 

The  set  of  representative  control  laws  for  a  CCV  7FBW  application  are 
shown  in  Figures  1-6  through  1-17.  An  overview  of  the  individual  pitch,  roll, 
yaw,  flutter,  and  maneuver  for  autoland,  go-around,  and  CCV  FBW  is  shown  in 
Figure  1-6.  The  individual  control  law  diagrams  are  referenced  in  Figure  1-6 
and  presented  in  Figures  1-7  through  1-15.  Sensor  and  mode  control  interface 
requirements  and  servo  and  display  interface  requirements  are  shown  in 
Figures  1-16  and  17. 

Processor  resource  estimates  for  a  high-performance  FBW  aircraft  flight 
control  system  designed  to  meet  the  foregoing  requirements  are  tabulated  in 
Table  1-2.  The  total  storage  requirements  are  approximately  13,000  16 -bit 
words  of  program  storage  and  1,300  16 -bit  words  of  data  memory.  The  performance 
needed  is  about  320,000  operations  per  second.  These  estimates  are  obtained 
through  sizing  the  application  on  a  16-bit  flight  control  computer  having  the 
instruction  repertoire  and  execution  times  shown  in  Addendum  B. 

C.  MULTIMODE  RADAR  SYNTHETIC  APERTURE  GROUND  MAPPING 

A  multimode  radar  may  have  a  number  of  modes  of  which  the  following 
four  are  typical: 

1)  Medium  PRF  Air-to-Air  Search 

2)  High  Resolution  Spotlight  Mode  Synthetic  Aperture  Radar 
(SAR)  Mapping 

3)  Non-cooperative  Target  Recognition  (NCTR) 

4)  Terrain  Following  Terrain  Avoidance  (TFTA) 


The  associated  radar  signal  processor  should  be  capable  of  not  only 
processing  each  of  the  mode  returns  in  real  time  but  should  be  capable  of 
switching  between  any  pair  of  modes  in  real  time  without  hardware  changes. 
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Figure  [-11.  Pitch  Command  Auirmentation  Control  Law  (with 


Figure  1-12.  Roll  Command  Augmentation  Control  Law 


Figure  1-13.  Yaw  Command  Augmentation  Control  Law 


Figure  [-11.  Pilch  Command  Augmentation  Control  Law  (with  7  Hold) 
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Figure  1-16.  Sensor  and  Mode  Control  Block 


TABLE  1-2. 


MEMORY  AND  TIMING  ESTIMATES 


Function 

Program  Memory 
(Words) 

Data  Memory 
(Words) 

Execution  Rates 
(KOPS) 

1.  Executive 

1400 

190 

24.00 

2.  Input  Signal 
Management 

1500 

300 

103.60 

3.  Control  Laws 

4500 

500 

176.00 

4.  Outer  Loop 
Control  Laws 

700 

100 

8.00 

5.  Actuator  Signal  1000 

50 

4.00 

6 .  Built-in  Test 

3000 

100 

+ 

7.  Data  Management  8 00 

50 

4.00 

Total 

12900 

1290 

320.40 

*  Not  applicable 

-  either  background  or  offline. 

In  all  air-to-air  and  air-to-ground  mode?  except  TFTA,  the  underlying 
processing  principle  is  a  doppler  frequency  analysis  of  a  coherent  radar 
spectrum.  Consequently,  a  fully  coherent  radar  has  many  of  the  features 
needed  in  a  multimode  radar.  However,  the  exact  processing  functions  and 
sequence  of  operations  differ  substantially  in  the  various  modes.  In  the  air- 
to-air  modes,  the  processor  is  primarily  concerned  with  the  rejection  of  the 
ground  return  spectrum  to  allow  detection  of  a  comparatively  weak  return 
from  an  airborne  target.  In  the  high- resolution  air-to-ground  modes,  the 
processing  task  requires  a  high -resolution  development  of  the  ground  return 
spectrum  into  its  doppler  components  from  which  a  map  can  be  generated. 

The  radar  synthetic  aperture  (SAR)  mode  has  been  selected  for  signal 
processor  sizing.  Its  block  diagram  is  shown  in  Figure  1-18.  The  beginning 
of  the  algorithmic  flow  is  the  sampled  and  quantized  video  developed  by  a  high¬ 
speed  analog -to-digital  converter.  The  converted  data  are  presented  in  bursts 
and  temporarily  stored  in  a  buffer  memory  as  shown  in  Figure  1-18.  After  the 
burst  has  been  captured  in  the  buffer,  the  data  rate  is  downshifted  and  all  of 


lift 


* 


Figure  [-18.  Synthetic  Aperture  Radar  Signal  Processing 


the  remainder  of  the  processing  for  this  burst  is  accomplished  in  the  remaining 
pulse  repetition  rate  interval.  This  additional  processing  includes  presumnung 
and  motion  compensation  using  data  supplied  by  the  radar  data  processor.  This 
is  followed  by  two-dimensional  transformation  of  the  data  and,  finally,  by  post¬ 
processing.  Because  of  the  variety  of  missions  and  tasks  that  can  be  anticipated, 
and  because  the  signal  processor  is  part  of  a  multimode  radar  processing  string, 
the  signal  processor  must  be  programmable. 

The  following  implementation  assessment  is  based  on  the  low  cost,  real 
time  processor  for  SAR*  systems  having  a  5  KHz  PRF  and  producing  a 
512  x  512  point  map.  The  more  detailed  consideration  of  'he  operation  of 
individual  blocks  of  the  signal  processor  begins  with  the  input  buffer. 

Buffer 


The  function  of  the  buffer  memory  is  to  downshift  the  high-speed  input 
data  to  the  lower  speed  of  the  rest  of  the  radar  signal  processor.  The  serial 
delay  line  buffer  receives  80  MHz  complex  samples  of  two  separate  antenna 
polarizations.  Each  of  these  samples  is  quantized  to  one  bit  in  both  the 
I  and  Q  channels.  A  total  of  2048  pairs  of  complex  samples  is  serially  stored 
in  four  separate  delay  lines,  one  for  each  pulse  burst.  Subsequently,  this 
stored  information  is  serially  shifted  out  to  the  presummer  at  a  12  MHz  rate. 

Presummer 


The  function  of  this  unit  is  to  select  the  range  samples  nearest  to  the 
desired  range  cells  and  weight  them  in  proportion  to  their  closeness  to  the 
corresponding  azimuth  cell  before  summing  them.  The  presummer  process¬ 
ing  sequence,  shown  in  Figure  1-19,  initially  stores  the  incoming  data  in  a 
set  of  latches.  The  next  step  is  to  multiply  the  data  by  a  stored  reference 
value  and  add  a  previous  value  based  on  attitude  information  supplied  by  the 
radar  data  processor.  Data  thinning  and  compression  are  achieved  by 
ignoring  undesired  sample  data  inputs  and  by  reducing  the  output  to  4 -bit 
complex  words  consisting  of  two  bits  of  I  and  two  bits  of  Q  channel  data.  As 
a  consequence,  the  output  data  rate  has  been  reduced  to  1  MHz  and,  the  data 
handling  shifts  from  serial  word  processing  to  block  processing  of  data  arrays. 
Thus,  storage  can  be  centralized  to  a  bulk  working  store  rather  than  being 
distributed  in  a  number  of  memories  located  in  the  individual  processing 
functions. 
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Figure  1-19.  Presummer 


Vector  Processor 


Compensation  for  aircraft  motion  is  achieved  through  multiplication  of 
all  of  the  points  by  a  two-dimensional  array.  This  array  is  obtained  from  the 
radar  data  processor,  which  uses  aircraft  attitude  system  information  to  f 

generate  the  array  values.  Between  one  million  to  a  million  and  half  complex 
multiplies  are  required  to  make  these  corrections. 

Two-Dimensional  FFT 

The  two-dimensional  transformation  of  the  radar  map  is  done  in  two  steps. 

First,  the  ground  map  is  transformed  in  the  range  direction  by  transforming 
512  points  from  each  of  the  512  azimuth  lines.  After  the  range  transformation 
has  been  completed,  512  512-point  transforms  in  the  azimuth  direction  are  ^ 
executed.  These  transformations  include  a  computational  load  of  about  6  ■  10° 
butterflies  per  second  on  16-bit  complex  data,  having  8  bits  of  I  and  8  bits  of 
Q.  Intermediate  storage  requirements  led  to  the  addition  of  a  512-word  memory 
capable  of  storing  complex  data  in  addition  to  the  use  of  the  bulk  working  store. 

Post  Processor 

The  last  major  functional  unit  in  the  algorithmic  flow  determines  the 
magnitude  of  the  complex  data  transform  outputs  and  integrates  the  resulting 
array.  The  512  x  512  point  maps  require  about  a  million  and  a  half  operations 
per  second  on  data  ranging  up  to  16  bits. 

The  total  SAR  ground  map  signal  processing  requirements  are  summarized 
in  Table  1-3.  These  results  indicate  that  a  high  throughput  processor,  which 
achieve  execution  rates  in  excess  of  20  MIPS,  is  required.  Although  a 
large  bulk  memory  can  be  emploved,  there  is  a'so  a  requirement  for  a  number 
of  smaller  distributed  RAM  memories.  Most  of  the  storage  is  operated  in  the 
random  access  mode.  The  input  buffer  is  Itkelv  to  be  implemented  most 
economically  in  delay-line  form.  These  estimates  are  based  on  a  processor 
capable  of  performing  the  macro  instructions  listed  in  Addendum  C. 
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TABLE  1-3.  SAR  PROCESSING  RATE  AND  STORAGE  REQUIREMENTS 


Function  Name 

PRF  Buffer  and 
Presummer 

Vector  Processor 
(Motion  Compensation) 

FFT  (2) 

Bulk  Memory 

Post  Processor 


Storage 

Processing  Rate  (Bits) 


6  x  10  complex  multiplies  32. OK 


0 

1.5  <  10  complex  multiplies 

12  x  10^  butterflies  sec 
0 

2  v  10  transfers  sec 
1.5  ■  10^  operations  sec 


4.  OK 
8.  OK 
2. 5M 
64K 


Total 


20  ■  10  complex  multiplies 
+ 

2  10^  memory 

accesses  sec  I  3.5M 


1.5  MIPS 
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Figure  A-l.  Stabilator  Functions 
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Figure  A-2.  Trailing  Edge  Flap  *TEF)  Functions 
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Figure  A -4.  Rudder  Functions 


DIGITAL  PROCESSING  SECTION 


ADDENDUM  B 


FUNCTIONAL  LISTING  OF  INSTRUCTIONS 


LOAD  STORE  INSTRUCTIONS 


Muemonic 

Instruction  Description 

Execution  Time 

LDU 

Load  UR  from  Program  Memory 

2.0 

LDUS 

Load  UR  from  Scratchpad  Memory 

1 .  5 

LULB 

Load  UR  (Left  Byte)  Immediate 

1.25 

LURB 

Load  UR  (Right  Byte)  Immediate 

1.25 

LDL 

Load  LR  from  Program  Memory 

2.0 

LDLS 

Load  LR  from  Scratchpad  Memory 

1.5 

LLLB 

Load  LR  (Left  Byte)  Immediate 

1.25 

LLRB 

Load  LR  (Right  Byte)  Immediate 

1.25 

LDA 

Load  XA  from  Program  Memory 

2.0 

LDAS 

Load  XA  from  Scratchpad  Memory 

1.5 

LALB 

Load  XA  (Left  Byte)  Immediate 

1.25 

LARB 

Load  XA  (Right  Byte)  Immediate 

1.25 

LDB 

Load  XB  from  Program  Memory 

2.0 

LDBS 

Load  XB  from  Scratchpad  Memory 

1.5 

LBLB 

Load  XB  (Left  Byte)  Immediate 

1.25 

LBRB 

Load  XB  (Right  Byte)  Immediate 

1.25 

LDC 

Load  XC  from  Program  Memory 

2.0 

LDCS 

Load  XC  from  Scratchpad  Memory 

1.5 

LCLB 

Load  XC  (Left  Byte)  Immediate 

1.25 

LCRB 

Load  XC  (Right  Byte)  Immediate 

1.25 

STU 

Store  UR  into  Program  Memory 

2.0 

STUS 

Store  UR  into  Scratchpad  Memory 

1.75 

STLS 

Store  LR  into  Scratchpad  Memory 

1.75 

STAS 

Store  XA  into  Scratchpad  Memory 

1.75 

STBS 

Store  XB  into  Scratchpad  Memory 

1.75 

STCS 

Store  SC  into  Scratchpad  Memory 

1.75 

128 


ARITHMETIC  INSTRUCTIONS 


Mnemonic 

Instruction  Description 

Execution  Time 

ADU 

Add  to  UR  from  Program  Memory 

2.0 

A  DUS 

Add  to  UR  from  Scratchpad  Memory 

1.5 

ADBU 

Add  to  UR  (Right  Byte)  Immediate 

1.25 

ADBL 

Add  to  LR  (Right  Byte)  Immediate 

1.25 

A  DBA 

Add  to  XA  (Right  Byte)  Immediate 

1.25 

ADBB 

Add  to  XB  (Right  Byte)  Immediate 

1.25 

ADBC 

Add  to  XC  (Right  Byte)  Immediate 

1.25 

AMS 

Add  to  Scratchpad  Memory  from  UR 

2.5 

DIV 

Divide  UR  &  LR  by  Program  Memory 

10.75 

DIVS 

Divide  UR  &  LR  by  Scratchpad  Memory 

10.  5 

MPY 

Multiply  UR  by  Program  Memory 

6.0 

MPYS 

Multiply  UR  by  Scratchpad  Memory 

5.75 

SBU 

Subtract  Scratchpad  Memory  from  UR 

2.0 

SBUS 

Subtract  Scratchpad  Memory  from  UR 

1.5 

REGISTER  INSTRUCTIONS 


Mnemonic 

Instruction  Description 

Execut  ion  Ti  me 

ABSU 

Absolute  Value  of  UR 

1.25  -  1.75 

CILB 

Clear  Indicator  (Left  Byte)  Immediate 

1.25 

CIRB 

Clear  Indicator  (Right  Byte)  Immediate 

1.25 

CPLU 

Complement  UR 

1.5 

INV 

Invert  UR 

1.25 

SILB 

Set  Indicator  (Left  Byte)  Immediate 

1.25 

SERB 

Set  Indicator  (Right  Byte)  Immediate 

1.25 

TSU 

Transfer  SR  to  UR 

1.25 

TUS 

Transfer  UR  to  SR 

1.25 

XUA 

Exchange  UR  and  XA 

1.75 

XUB 

Exchange  UR  and  XB 

1.75 

XUC 

Exchange  UR  and  XC 

1.75 

XUL 

Exchange  UR  and  LR 

1.75 
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INPUT  'OUTPUT  INSTRUCTIONS 


Mnemonic 

Instruction  Description 

Execution  Time 

CLR 

Clear  Device  Controller 

1.25 

CLRI 

Clear  Interrupt  Specified 

1.25 

ENBL 

Enable  Interrupts  from  Device 

1.25 

INHB 

Inhibit  Interrupts  from  Device 

1.25 

SLZ 

Shift  UR  Left  -  Enter  Zeros 

1.25  +  .  25(n) 

SLZD 

Shift  Double  Left  -  Enter  Zeros 

1.25  +  .25(n) 

SLZX 

Shift  Double  Left  by  XC'  -  Enter  Zero 

1.25  +  .  25(n) 

SRC 

Shift  UR  Right  -  Circulate  Bits 

1.25  +  .  25(n) 

SRCD 

Shift  Double  Right  -  Circulate  Bits 

1.25  +  .  25(n) 

SRS 

Shift  UR  Right  -  Repeat  Sign 

1.25  +  .  25(n) 

SRSD 

Shift  Double  Right  -  Repeat  Sign 

1.25  +  .  25(n) 

SRSX 

Shift  Double  Right  by  XC  -  Repeat  Sign 

1.25  +  .25(n) 

SRZ 

Shift  UR  Right  -  Enter  Zeros 

1.25  +  .  25(n) 

SRZD 

Shift  Double  Right  -  Enter  Zeros 

1.25  +  .  25(n) 

DOUBLE  PRECISION  INSTRUCTIONS 


Mnemonic 

Instruction  Description 

Execution  Time 

ADD 

Add  Double  from  Program  Memory 

3.0 

ADDS 

Add  Double  from  Scratchpad  Memory 

2.5 

ADMS 

Add  Double  to  Scratchpad  Memory 

4.25 

LDD 

Load  Double  from  Program  Memory 

3.0 

LDDS 

Load  Double  from  Scratchpad  Memory 

2.5 

STDS 

Store  Double  into  Scratchpad  Memory 

3.0 

SBD 

Subtract  Double  from  Program  Memory 

3.0 

SBDS 

Subtract  Double  from  Strachpad  Memory 

2.5 

ABSD 

Absolute  Value  of  Double  Register 

1.25  -  2.25 

CPLD 

Complement  Double  Register 

1.75  -  2.0 

ZRD 

Zero  Dobule  Register 

1.5 

NRM 

Normalize  Double  Register 

2.0  +  .  25(n) 
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LOGICAL  INSTRUCTIONS 


Mnemonic 

Instruction  Description 

Execut  ion  Time 

NDU 

And  to  UR  from  Program  Memory 

2.0 

NOUS 

And  to  UR  from  Scratchpad  Memory 

1.5 

ORU 

Or  to  UR  from  Program  Memory 

2.0 

ORTJS 

Or  to  UR  from  Scratchpad  Memory 

1.5 

CBSP 

Clear  Bits  Specified  by  Bit  Mask 

2.75 

SBSP 

Set  Bits  Specified  by  Bit  Mask 

2.75 

SKSP 

Skip  on  Bits  Specified  by  Bit  Mask 

2.0  -  2.25 

BRANCHING  INSTRUCTIONS 


Mnemonic 

Instruction  Description  i 

Execut  ion  Ti  m 

DSSZ 

Decrement  and  Skip  if  Scratchpad 
is  Zero 

2.5  -  2.75 

JINT 

Jump  to  Service  Interrupt 

8.0 

JMP 

Jump  Unconditional 

1.5 

JMPI 

Jump  Unconditional,  Indirect 

2.0 

JMS 

Jump  to  Subroutine 

1.5 

.IMS  I 

Jump  to  Subroutine,  Indirect 

2.0 

JSNS 

Jump  After  Device  Sense 

2.75 

RTN 

Return  from  Subroutine 

1.0 

RINT 

Return  from  Interrupt  Routine 

5.0 

SIE 

Skip  if  Program  Memory  Equal  to  UK 

2.0  2.2 

SISE 

Skip  ii  Scratchpad  Memory  Equat  to  UR 

1.75-  2.0 

SIG 

Skip  if  Program  Memory  Greater 
than  UR 

2.0  -  2.2 

SISG 

Skip  il  Scratchpad  Memoiv  Greater 
than  UR 

1.75-  2.0 

SIL 

Skip  if  Program  Memory  Less 
than  UR 

2.0  -  2.2 

SESL 

Skip  if  Scratchpad  Memory  Less 
than  UR 

1.75  -  2.0 

SKLB 

Skip  on  Indicator  (Left  Byte)  Immediate 

1.25  -  1.5 

SKRB 

Skip  on  Indicator  (Right  Byte)  Immediate 

1.75  -  2.0 

3KR 

Skip  if  Device  is  Ready 

1.75-  2.0 

I  3  1 
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ADDENDUM  C 

SIGNAL  PROCESSOR  MACRO  LISTING 
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REAL  VECTOR  OPERATIONS 


Vector  Clear 
Vector  Move 
Vector  Negate 
Vector  Add 
Vector  Subtract 
Vector  Multiply 
Vector  Divide 
Vector  -  Scaler  Add 
Vector  -  Scaler  Multiply 
Vector  -  Signed  Squared 
Vector  Absolute  Value 
Vector  Square  Root 
Vector  Logaritm  (Base  10) 
Vector  Natural  Logarithm 
Vector  Exponential 
Vector  Sine 
Vector  Cosine 
Vector  Arctangent 
Vector  Arctangent  of  (Y  X) 
Sum  of  Vector  Elements 
Sum  of  Vector  Squares 
Dot  Product  of  Two  Vectors 
Vector  Float 

Vector  Scan  and  Scale  (Fix) 


VECTOR  MAXIMUM  MINIMUM  OPERATIONS 

Maximum  Element  in  a  Vector 
Minimum  Element  in  a  Vector 
Maximum  Magnitude  Element  in  a  Vector 
Minimum  Magnitude  Element  in  a  Vector 
Maximum  and  Minimum  of  a  Vector 
Maximum  and  Minimum  Magnitude  of  a  Vector 
Vector  Maximum  (of  Two  Vectors) 

Vector  Minimum  (of  Two  Vectors) 

Vector  Maximum  Magnitude  of  Two  Vectors 
Vector  Minimum  Magnitude  of  Two  Vectors 

VECTOR  FILTER  OPERATIONS 

Vector  Polynomial  Evaluate 
Difference  Equations 
4  Pole  Filter  (Difference  Equation) 

COMPLEX  VECTOR  OPERATIONS 

Complex  Vector  Multiply 
Complex  Vector  Reciprocal 
Complex  Vector  Magnitude  (Square) 
Rectangular  to  Polar  Conversion 
Polar  to  Rectangular  Conversion 

MATRIX  OPERATIONS 
Matrix  Transpose 
Matrix  Multiply 

Matrix  Multiply  (Dimension  32) 

Matrix  Inverse 

Matrix  Vector  Multiply  (3  •  3) 

Matrix  Vector  Multiply  (4  -4) 
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FAST  FOURIER  TRANSFORM  OPERATIONS 

Complex  FFT 
Real  FFT 

Scrambled  to  True  Order  FFT  Passes 
Bit -Reverse  Order  an  Array 
Real  Transform  Unravel  Pass 

SIGNAL  PROCESSING  OPERATIONS 

Convolution  (or  Correlation) 

Wiener -Levinson  Algorithm 
Bandpass  Filter 
Power  Spectrum 
Complex  Cepstrum 
Inverse  Complex  Cepstrum 
Schaffer's  Phase  Unwrapping 


1 


APPENDIX  II 


DESIGN  GUIDELINES 


! .  INTRODUCTION 


The  attainment  of  a  lettable  self-diagnosing  design  necessitates  that 
faults  be  detected  when  they  occur.  Subsequently,  the  errors  produced 
bv  the  faults  must  lie  masked  or  the  faulty  unit  should  be  removed  from  the 
signal  chain  and  replaced  with  an  operational  equivalent. 

A  knowledge  of  the  error  characteristics  is  necessary  in  order  to  detect 
these  errors.  This  study  is  concerned  with  the  errors  produced  by  integrated 
semiconductor  circuits,  particularly  large  scale  integrated  (LSI)  circuit 
devices  utilized  in  processors,  which  includes  memory  and  microprocessor 
devices.  These  LSI  device  error  characteristics  differ  from  earlier,  smaller - 
scale  devices  in  that  one  or  more  taults  may  produce  one  or  more  errors. 

Wang  and  Lovelace^  piesent  data  that  indicate  that  single  bit  errors  for 
memory  devices  mav  represent  only  75-80','  of  the  total  error  population. 

Their  work  also  indicates  that  the  compositon  of  the  failure  population  has  a 
significant  effect  on  the*  reliability.  Consequently,  error  protection  techniques 
have  been  required  to  handle  both  single  and  multiple  stuck-at  faults.  Further 
attempts  at  characterizing  the  error  modes  have  been  unsuccessful,  primarily 
because  of  insufficient  data  on  available  LSI  devices.  This  is  due.  in  part, 
to  the  recent  introduction  ot  many  of  the  parts,  but  also  to  the  relatively  high 
obsolescent  rate  of  some  of  these  devices,  such  as  random -access  memory 
(RAM)  and  read-only  memory  (ROM)  devices.  The  net  result  of  this  condition 
is  that  the  number  ol  errors  that  must  be  accommodated  can  vary  between  a 
single  error  to  the  entire  set  of  outputs  or  inputs  that  are  related,  such  as 
all  of  the  output  of  a  port.  This  model  then  results  in  the  elimination  of  many 
otherwise  valuable*  techniques. 

The  s<  !  I -diagnosing  proces^oi  must,  itien,  be  able  to  detect  multiple 
internal  errors  and  determine  the  location  ol  the  lailure  with  sufficient 
resolution  so  that  the  subsequent  maintenance  action  is  quick  and 
effective.  This  approach,  has  :  he  advantage  of: 

1)  Easy  eri’oi  detect  ion,  since  the  errors  are  defected 
(usually)  upon  their  tirst  occurrence.  The  operation 

ol  the  processor  in  the  tailed  state  is  considered  during 
the  design. 

2)  Automatic  detection  ot  a  large  percentage  of  errors. 

Few  undet cell'd  error--  ,,<  cur. 

3)  Simple,  fast  diagnosis  due  to  built -in  error  detect  ion. 

4)  More  effective  handling  of  inconsistent  errors,  such 
as  interm  it  tents  and  transients.  Diagnosis  is  initiated 
immediately  upon  detection  of  the  error. 

1  W  mg,  S.O.,  Lovelace ,  K. .  "Improvement  of  Memory  Reliability  bv  Single 
Hit  Error  Correction",  COMCON  77. 
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5)  Manual  maintenance  is  simplified,  computer  maintenance 
costs  are  reduced. 

Fault  detection  techniques  have  been  emphasized  because  of  the  poor 
reliability  of  systems  possessing  less  than  complete  fault  detection.  For 
systems  based  on  standby  principles,  Losq^  has  shown  that  the  coverage 
affects  the  reliability  in  two  wavs  —  it  reduces  the  maximum  reliability  of 
the  system  and  it  modifies  the  shape  of  the  perfect  coverage  reliability  curve 
by  a  factor  of 

exp  (-X(l-c)T), 

where 

X  -  failure  rate, 
c  =  coverage  in  percent,  and 
T  =  time  interval 

The  price  of  increased  fault  tolerance  obtained  through  the  commitment 
of  additional  hardware  x'esources  is  an  increase  in  failure  probability  due  to 
these  added  resources.  Techniques  that  provide  reliability  enhancement  and 
self-diagnosis  are  particularly  effective  for  these  applications  provided  that 
the  maintenance  intervals  are  short  compared  to  the  system  MTBF. 

The  resulting  guidelines  should 

1)  provide  computational  capability  that  is  consistent 
with  the  previously  identified  baseline  processing 
requirements, 

2)  provide  a  modular  processor  architecture  that  is 
adaptable  to  changing  requirements  driven  by  either 
changing  mission  or  variety  of  application  requirements, 

3)  match  the  1977  technology  and  maintt<  n  flexibility 
with  respect  to  anticipated  improvements  in  the  state 
of  the  art. 


2  Losq,  J. ,  "Influence  of  Fault  Detection  and  Switching  Mechanisms  on  (he 
Reliability  of  Standby  Systems”,  FTC  75 


II.  REVIEW  OF  RELIABILITY  TECHNIQUES  FOR  LSI  DEVICES 


There  are  two  basic  techniques  for  self-diagnosing  systems  with  autumn' 
detection: 

1)  The  information  signals  of  the  system  are  encoded  in 
such  a  manner  that  the  signals  form  a  code  word  in  an 
error  detecting  correcting  code  under  fault -free  con¬ 
ditions.  When  a  detectable  fault  occurs,  an  error  is 
produced  that  is  a  non -code  word.  An  example  of  a 
single  error -detect  mg  code  is  replication  with  com¬ 
parison  \oting. 

2)  Periodic  diagnosis  ol  all  modules  for  error  detection. 

By  itselt.  the  second  technique  is  not  recommended  for  fault  diagnosis 
because: 

1)  Inconsistent  errors  may  not  be  detected  and  their  effect 
on  the  slate  and  data  base  of  the  system  cannot  be  pre¬ 
dicted  . 

2)  Rollback  restart  snapshots  of  the  state-ol-the-machine 
requirements  are  frequently  nut  consistent  with  real 

t  ime  applicat  ions . 

3)  Diagnostic  tools  tor  producing  test  vectors  for  multiple 
faults  and  errors  are  only  now  being  developed  m  i  the 
fault  location  accuracy  is  quite  suspect  . 

Combinations  of  these  two  basic  approaches  are  also  utilized  and  will 
be  dismissed.  The  'periodic  approach  will  most  likely  be  the  basis  of  the 
manual  diagnosis  that  supplements  the  automatic  on-line  error  detection  and 
locates  the  failed  component  to  the  self-diagnosing  processor  replaceable 
module  level.  Thus,  the  overall  maintenance  action  is  a  combination  of  the 
spatial  redundancy  of  the  coding  approach  and  the  temporal  redundancy  of  Die 
periodic  diagnosis. 

1.  Coding  Techniques 

The  simplest  codes  are  those  of  replication  combined  with  a  form  of 
comparison.  Duplication  with  comparison  is  perhaps  the  simplest  error 
detecting  technique.  In  this  technique,  two  independent  systems  compute  the 
same  function  and  the  results  arc  compared  to  detect  differences.  When  an 
error  exists,  the  insults  of  the  independent  systems  will  differ  and  the 
comparison  will  detect  the  difference.  The  comparison  can  usually  provide 
location  information  (identify  the  bit  location)  if  desired. 
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The  simplest  error  correcting  technique  using  coding  involves  the  use  of 
triplication.  In  this  technique,  three  independent  systems  compute 
the  same  function  and  the  output  is  the  majority  function  of  the  results  ol 
the  three  systems.  This  voting  of  the  output  is  performed  on  a  signal -bv- 
signal  basis  and  the  effect  of  single  error’s  is  masked.  Such  systems  are 
commonly  designated  as  triple-modular  redundancy  (TMR).  As  in  duplication 
with  comparison,  location  information  can  be  derived. 

Extension  of  the  degree  of  replication  beyond  three  has  been  labeled 
N-modular  redundance  (NMR),  where  N  modules  are  used  to  execute  the  same 
function  and  (N-l)  2  or  fewer  failures  are  masked.  The  majority  function, 
which  produces  the  output,  has  N  inputs  and  a  threshold  equal  to  the  largest 
integer  greater  than  or  equal  to  N  2,  As  with  other  replication  schemes, 
location  can  be  derived  at  the  cost  of  additional  resources. 

Variations  of  the  replication  code  include  systems  that  both  correct 
errors  and  locate  the  source  of  the  error.  As  errored  modules  are 
identified,  thev  are  switched  out  of  the  system  and  replaced  by  standby 
modules.  When  inserting  the  standby  modules  into  the  system,  anv  internal 
memory  must  be  initialized. 

Another  variation  is  an  adaptive  technique,  where  the  threshold  of  the 
majority  function  is  reduced  as  errors  are  detected  and  the  offending  module 
is  switched  out  of  the  system.  Initially,  the  threshold  is  set  to  N  2:  then, 
as  errors  are  detected,  the  threshold  is  lowered  to  (N-l)  2.  < N -2 )/ 2 ,  (N-31,2. 
and  so  on,  until  the  number  of  modules  is  reduced  to  two  or  three. 

2.  Conventional  Coding  Techniques 

Conventional  coding  techniques  have  been  developed  that  detect  and  or 
correct  errors.  On.lv  a  subset  of  aH  the  coding  techniques  is  of  interest 
for  this  study.  This  subset  is  useful  for  checking  computations  and  is  usually 
restricted  to  binary  codes  or  codes  that  are  closely  related  to  them.  A 
successful  utilization  of  computational  coding  techniques  in  self-diagnosing 
systems  depends  largely  on  the  nature  of  the  function  to  be  protected  by  the 
code.  Hence,  after  an  initial  general  discussion  of  codes,  the  effectiveness 
of  coding  techniques  will  be  considered  with  respect  to  the  memory,  processor, 
control,  and  internal  buses  of  a  self-diagnosing  processor. 

The  computational  codes  that  were  considered  are  the  linear  block  codes, 
the  arithmetic  codes,  and  checksum  codes.  Hamming  and  parity  and  b-adjacent 
codes  were  the  linear  block  codes  specifically  evaluated  for  transmission  and 
storage.  Arithmetic  codes  were  examined,  primarily  with  respect  to 
arithmetic  operations,  although  their  use  for  the  protection  of  storage  was 
examined.  So-called  low  cost  codes,  as  defined  by  Avizienis^,  received  most 
of  the  attention.  These  include  the  AN  and  residue  codes.  The  b-bit  byte 
checksum  codes  examined  were  those  having  a  check  symbol  of  the  form  . 

3  Avizienis,  A.,  "Digital  Fault  Diagnosis  bv  Low  Cost  Arithmetic  Coding 
Techniques",  Proc.  Purdue  Centennial  Year  Svmp.  Information 
Processing,  1:81-91. 


Application  of  these  techniques  to  large -scale  integrated  logic  circuits 
has  not  generally  been  successful  in  the  sense  that  the  implementations 
were  low  cost.  There  are  a  number  of  reasons  for  this:  first,  for  devices 
such  as  a  microprocessor,  there  is  a  mixture  of  structures  and  operations 
that  the  code  must  span  if  it  is  to  be  applied  external  to  the  device.  Since  the 
codes,  in  general,  are  matched  to  the  structure  and  operation,  this  method 
of  attack  leads  to  difficulties  that  are  still  unsolved. 

If  the  coding  techniques  are  applied  within  the  device,  the  size  of  the 
chip  must  be  expanded  and  (lie  number  of  pins  increases  unless  the  pin  can  be 
time  shared.  Generally,  this  is  not  possible.  Applying  coding  techniques 
within  a  device  of  the  size  of  a  microprocessor  at  the  register-to-register 
level  leads  to  a  redesign  of  the  function  and  usuallv  necessitates  an  increase 
in  chip  size.  Since  these  LSI  devices  are  already  at  or  near  the  current 
state-of-the-art  in  integration,  the  increase  in  chip  size  results  in  a  loss 
of  yield,  which,  already,  is  relatively  low  at  least  compared  to  small-scale  inte¬ 
gration  (SSI).  Since  the  coding  techniques  examined  thus  far  result  in  an 
increase  in  circuitry  ot  at  least  twice  the  original  device,  the  application 
of  coding  techniques  results  in  uneconomical  designs  because  of  the  low  yield. 

The  second  major  source  of  difficulty  in  applying  coding  techniques  to 
highly  integrated  devices  is  the  lack  of  good  error  models  that  relate  faults 
and  errors.  As  indicated  in  the  Introduction,  it  is  believed  that  single  stuck  - 
at  fault  modeling  results  in  insufficient  fault  coverage.  It  has  been  shown 
that  the  effectiveness  of  the  error  detection  is  very  sensitive  to  coverage, 
particularly  in  the  range  of  interest  lor  self-diagnosis.  Hence,  it  has 
been  decided  that  multiple  errors  must  be  considered.  Implementation 
costs  of  multiple  error  codes  in  the  range  of  four  to  eight  bits  has  been 
found  to  rise  rapidly.  Even  for  the  so-called  unidirectional  faults^,  the 
implementation  costs  increase  significantly  and  the  computational  delays 
increase  with  increasing  word  length. 

As  will  be  seen  in  the  application  of  redundancy  techniques  to  the 
various  functional  units  of  a  processor,  most  redundancy  techniques  that  tire 
theoretically  interesting  are  only  applicable  at  the  component  level.  Technology 
constraints  at  the  LSI  level  of  integration  tend  to  dictate  that  redundancy  should 
be  applied  over  chips  not  within  the  components  of  chips.  Hence,  relatively 
few  redundancy  techniques  remain  relevant.  Consequently,  architectural 
considerations  are  of  primary  importance  in  the  design  of  a  self-diagnosing 
processor.  Recovery  from  the  fault,  beginning  with  the  processing  of  any 
locaiu  -or  mat  ion  through,  possibly,  the  restoration  of  the  processing  and 
redundan  a  the  major  issue.  As  will  be  seen,  however,  a  number  of  these 
redundancy  techniques,  originally  intended  for  low-level  application,  form 
the  basis  for  enhancing  system  reliability. 


4  All  components  of  the  error  value  have  the  same  sign.  That  is,  the  only 
erroneous  bits  are  either  l's  changed  to  0's  or  0's  changed  to  l's  but  not 
both . 
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3.  Processors 

Bit -slice  microprocessors  such  as  the  AMD  2901,  in  contrast  to  mono¬ 
lithic  microprocessors,  allow  the  use  of  redundancy  techniques  at  a  lower 
level  and,  therefore,  are  better  candidates  for  redundancy  techniques.  Both 
arithmetic  and  parity  prediction  techniques  have  been  successfully  applied 
to  arithmetic  operations  and  can  be  effectively  used  for  the  protection  of  the 
microprocessor  arithmetic  logic  unit  (ALU).  But  these  techniques  do  not 
check  logical  operations.  Of  greater  consequence  is  the  fact  that  low-cost 
coding  techniques  for  checking  logical  operations  for  error  detection  have 
not  been  discovered.  It  is  generally  accepted  that  the  simplest  error 
detecting  codes  for  logical  operations  amount  to  duplication.  Previous  imple¬ 
mentations  of  error  detecting  designs  frequently  resorted  to  duplication. 
Variants  of  these  codes  have  been  developed  for  ripple  carry  arithmetic  units 
and  carry  look-ahead  designs  in  which  the  carry  circuits  are  disabled.  For 
bit -slice  microprocessors  similar  to  the  2900  series,  which  incorporate  byte 
carrv  look-ahead,  the  effect  of  incorporating  these  arithmetic  coding  techniques 
is  significant.  High-speed  arithmetic  structures,  using  cascades  of  2901's  and 
one  or  more  special  carry  look-ahead  devices,  can  be  implemented.  Speedup 
techniques  such  as  these  considerably  increase  the  execution  speed  o!  these 
structures  compared  with  ripple-carry  techniques,  particularly  for  long  words. 
Addition  of  this  code  circuitry  to  the  microprocessor  and  the  high-speed  carry 
look-ahead  would  reduce  the  execution  speed  of  the  microprocessor  and, 
possibly,  the  high-speed  carry  look-ahead.  Also,  larger  chips  would 
be  required  to  implement  this  additional  circutiry  barring  the  use  of  higher 
density  fabrication  techniques.  Hence,  this  development  was  not  pursued 
farther . 

Alternatively,  the  logical  operations  can  be  removed  from  the  ALU  and 
implemented  separately.  But  this  approach  suffers  from  increased  delay 
penalties  as  well  as  the  increased  implementation  costs  cited.  Implementation 
of  the  logic  execution  circuitry  external  to  the  microprocessor  suffers  from 
the  difficulty  that  one  or  more  of  the  operands  must  come  from  the  micro¬ 
processor's  register  file  through  the  output  port.  This  increases  the  execution 
time  for  single -register  file  sourced  operands  and,  probably,  would  double  the 
cycle  time  for  two  register  file  sourced  operands.  Thus,  this  approach  was 
also  not  recommended. 

Instead,  it  was  concluded  that  for  a  self-diagnosing  computer,  repli¬ 
cation  offered  the  best  trade-off  in  terms  of  protection  and  implementation 
and  execution  time  resources  for  the  current  state-of-the-art  of  integrated 
circuit  technology.  Conventional  coding  techniques  were  less  effective  for 
the  multiple  error  case,  when  applied  to  the  highly  integrated  LSI  devices, 
than  the  single  error  case  associated  with  the  mdeium  and  small-scale  inte¬ 
grated  circuits. 

For  monolithic  microprocessors,  replication  appears  to  be  the  best 
solution.  As  will  be  seen,  after  memory  and  bus  functions  are  examined. 

TMR  is  believed  to  be  the  best  general  solution  for  monolithic  processor 
applications. 
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4 .  Memory 

Conventional  coding  techniques  are  much  more  favorably  applied  to 
memory  functions  than  processors.  Part  of  the  reason  is  that  coder  decoders 
for  arithmetic  and  arbitrary  logic  are  just  about  as  complex  as  the  function 
they  are  protecting.  But  this  is  not  'rue  for  memories.  For  instance,  the 
encoder  decoder  for  a  4K  by  29  bit  word  memory^  can  be  realized  for  about 
T’i  of  the  simple  memory  resources.  In  addition,  semiconductor  LSI  memory 
devices  have  developed  in  such  a  way  that  many  of  the  problems  associated 
with  other  memory  technologies  have  been  eliminated.  Addressing  errors 
are  confined  to  a  single  chip  under  the  single  fault  assumption  by  including 
on  each  chip  its  own  decoder,  along  with  the  read  and  write  amplifiers  and 
read  write  control  circuitry. 

The  evaluation  of  t fie  implementation  requirements  of  codes  for  the 
protection  of  memory  is  primarily  concerned  with  three  major  contributors: 

1)  The  number  of  redundant  or  check  bits  that  must  be  added 
on  a  per  word  basis. 

2)  The  complexity  of  the  associated  encoders  and  decoders. 

3)  Additional  delav  incurred  as  a  result  of  adding  protei 
tion  devices  since  these  delays  usually  increase  the 
address  and  or  the  cycle  time  of  the  memory 

5 

Wenslev,  et  a)  ,  discuss  the  bounds  on  redundancy  codes,  properties 
of  Hamming,  Hong  Patel,  Abramson,  and  Gilbert  codes  that  almost  achieve  these 
bounds,  and  the  performance  of  these  error  correcting  code0-.  The  lower 
bounds  for  the  number  of  redundant  digits,  r,  as  a  function  o'  the  number  of 
information  bits,  k.  are  listed  in  Table  II  I  which  is  tak~n  from  Wensley^. 

The  bound  varies  with  the  number  of  burst  code  bits  in  a  protected  byte  of 
width  b,  and  the  type  of  code.  The  S  ,  columns  list  the  number  of  digits 
required  for  cyclic  burst  binary  codes  and  the  S  columns  list  those  for  the 
single  bvte  correcting  codes.  (Note  that  r  in  '.its  is  equal  to  b  multiplied  by 
the  entry  in  the  appropriate  column,  either  S_  or  S  .)  Hamming  codes,  with 
b  I  and  the  Hong  Patel  codes  for  b  2  and  T,  achieve  the  redundancy  implied 

bv  the  entries  listed  in  the  S,,  columns. 

r 

An  indication  of  the  decoder  complexity  for  a  particular  cellular  decoding 
scheme  for  generalized  Hamming  codes  discussed  bv  Wenslev  in  (5)  is  presented 
in  Table  IT -2  for  a  24  bit  word  memory  of  4096  words.  Memory  chip  size  is 
maintained  at  409fj  bv  configuring  the  chips  as  tollows:  (1  bit  wide  ■  4096), 

(2  bits  wide  •  2048),  (4  bits  wide  ■  1024),  and  8  bits  wide  •  612).  The  codes 
are  single,  b  bit  wide,  error  collecting  Hamming  codes.  For  this 
implementation,  the  2  -bit  wide  bvte  (2  bits  wide  2048  word  memory  chip) 
yields  the  best  design  in  terms  of  number  of  implementation  costs,  i.e.,  number 
of  chips  . 

5  Wenslev,  .I.1L.  Leviit,  K.N.,  Green,  M.W  ..  Goldber,  .1.  and  Neumann, 
"Design  of  a  Fault  Tolerant  Airborne  Digital  Computer",  Vol.  I.  Stanford 
Research  Institute,  N74  17909,  p...  26 
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SUMMARY  OF  DECODER  COMPLEX  IT 
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These  codes  mask  all  single  byte  errors.  To  achieve  diagnosis  of  the 
errors,  they  should  be  augmented  to  provide  detection.  In  some  cases,  this 
will  change  the  number  of  redundant  check  bits  and  the  decoder  complexity. 
However,  it  is  still  believed  that  the  optimum  byte  width  is  b  =  2. 

Another  less  elegant  coding  approach  is  to  employ  parity.  Jack,  et  al6, 
has  compared  various  versions  of  parity  with  checksum  and  Hamming  codes  for 
the  purpose  of  achieving  a  self-checking7  semiconductor  memory.  In  this 
paper,  Jack  et  al,  points  out  the  importance  of  coverage,  particularly  with 
respect  to  multiple  errors  in  memory  devices.  Three  different  forms  of 
parity  are  considered.  They  are: 

1)  Single  bit  parity  across  the  entire  memory  word. 

2)  Byte  wide  parity  across  8-bit  bytes  using  a  single 
parity  bit  per  byte . 

3)  Chip-wide  parity  provides  one  parity  group  for  each  bit 
position  in  a  chip  for  a  group  of  bytes.  For  a  16 -bit 
data  word  implemented  using  4-bit  wide  memory  devices, 
four  parity  check  bits  are  required  with  four  parity 
checkers . 

From  coverage  considerations,  tiiev  conclude  that,  on  the  average,  chip- 
wide  parity  and  Hamming-like  codes  provide  the  best  self-checking  coverage 
for  data  faults  of  anv  of  the  detection  approaches  investigated  The\  are  also 
unsurpassed  in  terms  of  worst -case  possible  failure  modes.  In  presenting  the 
results  of  the  coverage  analysis,  they  note  that  no  exact  overall  coverage 
figures  can  be  determined  unless  all  the  failure  modes  and  the  likelihood  of 
their  occurrence  for  the  semiconductor  devices  are  known.  Their  results  for 
a  representative  memory  requirement  of  IK  words  ■  16  bits  under  conditions 
of  a  single  fault  are  shown  in  Table  II -3 .  They  observe  that,  for  a  one 
microsecond  cycle  time  meieorv.  there  is  no  execution  time  penalty  for  chip¬ 
wide  parity  or  Hamming  code  checkers  because  the  overhead  of  85  nanoseconds 
for  each  word  can  tie  overlapped.  The  variation  in  delay  across  (lie  approaches 
appears  to  tie  sufficiently  small  that  delay  times  should  not  be  a  major  factor  at 
this  speed  of  memory  operation  (1  microsecond  cycle  time). 

The  results  with  respect  to  the  Hamming  and  chip-wide  parity  code 
approaches  are  summarized  in  Table  n-4,  A  comparison  of  the  results  shows 
that : 

G  ca,Cfk;  Kmnev’  h.L.,  Berg.  K.O.,  "Compari  son  of  Alternative 

Self-Checking  Techniques  in  Semiconductor  Memories"  COMCON  77 
pg.  170-173. 

7.  A  totally  self -checking  circuit  is  a  circuit  that  is  self-testing  for  a 
normal  input  set,  N,  and  a  non-trivial  fault  set,  F.  and  fault  secure 
for  N  and  a  non-trivial  fault  set,  F  .  A  circuit  is  self-testing  if,  for 
every  fault  from  a  prescribed  set.  The  circuit  produces  a  non-code  space 
output  for  at  least  one  code  input.  A  circuit  is  fault  secure  if,  for 
every  fault  from  a  prescribed  set,  the  circuit  never  produces  an  incorrect 
code  space  output  for  code  space  inputs. 


TABLE  II -3  .  MEMORY  SELF-TEST  METHODS  COMPARISON  CHART 
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TABLE  II -4  .  SUMMARY  IMPLEMENTATION  TRADE-OFF  COMPARISON 


1)  Both  chip-wide  and  Hamming-like  codes  provide  extremely 
high  coverage  for  massive  on-chip  failure  modes. 


2)  Both  Hamming-like  codes  and  chip-wide  parity  achieve 
lOO'o  error  detection  for  either  single  device  data  or 
address  faults  for  chips  no  wider  than  4  bits. 

3)  Hamming -like  codes  give  better  detection  for  multiple 
device  failures  than  simple  parity  schemes  at  an 
increase  in  hardware. 

4)  Chip-wide  parity  requires  fewer  parity  bits,  less  de¬ 
tection  circuitry,  and  has  less  interconnect  complexity'. 

5)  Neither  approach  permits  errored  data  or  instructions 
to  be  passed  to  the  CPU  assuming  single  chip  failure. 

Of  the  above  approaches,  only  Hamming-like  codes  have  the  potential 
for  error  correcting  capability.  Techniques  for  designing  error-free  decoding, 
coupled  with  error  correcting  memory,  have  been  described  by  Carter,  et  alH. 
Single  error  correcting  double  error  delecting  (SEC  D ED)  Hamming-like  codes 
have  been  used  to  protect  memories  where  it  is  assumed  that  single  failures 
affect  one  bit  of  the  word  and  two  failures  affect  two  bits  of  the  retrieved 
word.  Early  versions  of  this  approach  used  "self -testable"  SEC  DED  decoders 
and  encoders  or  translators  for  converting  from  bus  parity  code  to  memory 
coding  and  vice  versa.  Here  self-testing  is  understood  to  mean  circuits  that 
test  the  proper  functioning  of  e\'ery  component  during  normal  operation.  The 
decoding  circuitry  is  dynamically  tested  while  it  performs  its  function  of 
correcting  erroneous  data  without  mistaking  these  errors  for  errors  caused 
by  circuit  faults  and  vice  versa.  The  results  of  applying  these  techniques  to 
memories  of  32,  fi4,  and  128  bits  are  shown  in  Table  II - 5  from  Carter,  et  al. 
The  actual  data  bits  are  listed  in  the  column  labeled  k  and  the  redundant  or 
check  bits  required  are  shown  under  the  r  column  (n  is  the  sum  of  K  -  r). 

Tin*  next  column  to  the  right,  labeled  "Conventional  SEC  DED  to  Byte  Parity 
Circuits",  lists  the  circuits  needed  to  translate  from  the  SEC  DED  memory 
code  to  the  bus  parity  code  using  conventional  design  techniques.  The  next 
column  to  the  right,  labeled  "Sell -Checking  SEC  DED  to  Byte  Parity  Circuits"  , 
gives  the  figures  for  the  self-testing  version  (including  translation).  As  seen 
in  the  last  column,  only  a  small  increase  was  required  to  achieve  self-checking 
and  this  difference  decreases  with  word  length.  Hence,  translators  can  be  a 
source  of  large  implementation  cost  and  either  should  be  avoided  if  possible  or 
should  be  minimized  by  choosing  compatible  codes  to  the  extent  possible. 


8  Carter,  W.C.,  Jessep,  D.C.,  and  Wadia,  A.,  "Error-Free  Decoding  for 
Failure  Tolerant  Memories,  FTC  71. 
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TABLE  II - 5  .  COMPARISON  OF  CONVENTIONAL  AND  SELF-CHECKING 
MEMORY  IMPLEMENTATIONS 


CONVENTIONAL  SELF-CHECKING 
SEC  DED  TO  BYTE  SEC  DED  TO  BYTE 
PARITY  CIRCUITS  PARITY  CIRCUITS  _ INCREASE^ 

1050  1130  7.4 

64  57  7  2300  2380  3.5 

128  120  8  4950  5000  1.1 


n  Total  Number  of  Bits  in  Message  Being  Protected 

k  -  Total  Number  of  Information  Bits  in  Message  Being  Protected 

r  -  Total  Number  of  "Check"  Redundancy  Bits  In  Message  Being  Protected 


n  k+r 


This  work  has  led  to  the  development  of  techniques  for  designing  self¬ 
checking  circuits  not  only  for  memories  but  for  all  computer  functions,  such 
as  the  arithmetic  and  logic  unit  and  control.  In  the  case  of  memories.  Carter 
and  McCarthy^  have  extended  these  principles  and  have  designed  a  fault -tolerant 
memory  based  on  a  modified  Hamming-like  SEC  DED  code.  The  logic  of  this 
memory  is  designed  to  be  fault  secure,  self-testing,  and  is  claimed  to  exhibit 
good  cost  performance.  The  testing  procedures  are  designed  to  detect  faults 
and  prevent  error  accumulation.  In  the  recovery  process,  single  error 
correction  can  be  validated  and  most  double  errors  caused  by  two  faults 
corrected. 

Husband  and  Szygenda^  have  provided  a  detailed  synthesis  and  analysis 
of  a  cost  effective,  ultrareliable,  high  speed,  semiconductor  memory  system. 

A  16K  word  by  64  bit  memory  system  with  a  250  nanosecond  cycle  time  was 
designed  that  corrected  over  99°,'  of  all  single  faults.  The  approach  is  based 
on  a  single  error  Hamming  code,  and  support  electronics  that  are  designed 
to  cause  all  faults  outside  the  memory  proper  to  produce  no  more  than  one 
bit  error  in  any  one  memory  word.  The  error -correct  ing  circuits  make  up 
less  than  2%  of  the  total  circuitry  and  the  increase  in  circuitry  over  the  simplex 
system  is  less  than  20%.  Cycle  time  is  not  increased  unless  a  fault  occurs. 

The  implementation  results  are  tabulated  in  Table  1 1 -6  taken  from  Husband  and 
Szygenda's  paper. 

9  Carter,  W.C.,  McCarthy,  C.E.,  "Implementation  of  Experimental  Fault 
Tolerant  Memory,  IEEE  Trans,  on  Computer,  C-25,  No.  6,  ,lune  1976. 

10  Husband,  E.  W.,  Szygenda,  S.A.,  "Synthesis  and  Analysis  of  a  Cost  Effec¬ 
tive,  Ultrareliable,  Highspeed,  Semiconductor  Memory  System,  IEEE  TC  on 
Reliability,  Vol.  R-25,  No.  3,  August  1976. 


n  k _ r 

32  26  6 


TABLE  II -6 

COMPARISON  OF  16K  BY  64  BITS 
SIMPLEX  AND  FAULT  TOLERANT  MEMORY  SYSTEM 


FAULT 

SIMPLEX  TOLERANT  INCREASE  IN  PERCENTAGE 
SYSTEM  SYSTEM  DEVICES  OF  INCREASE 


Memory 

4096 

4608 

512 

12.5 

Address 

48 

128 

80 

167 

Write  Enable 

6 

16 

10 

167 

Chip  Select 

10 

31 

21 

210 

Data  In 

8 

9 

1 

12.5 

Data  Out 

40 

45 

5 

12.5 

E r ror  Correction 

96 

96 

To  summarize,  (lie  particular  memory  protection  approach  is  a  function 
of  the  memory  cycle  time,  the  word  length,  the  protection  used  in  the  rest  of 
the  system,  and  the  size  (capacity)  of  the  memory.  Assuming  a  system 
compatible  memory  cycle  time  and  a  memory  device  selection  based,  primarily 
on  minimum  cost  the  recommended  protection  approach  as  a  function  ot 
memory  size  is  given  in  Table  II -7. 


5.  Control 

Until  recently,  replication  was  the  only  known  method  of  control  unit 
error  detection^.  However,  the  introduction  of  microprogrammed  techniques 
to  control  unit  desn;n  has  eased  the  problem  since  the  complexity  of  the  unit 
is  reduced.  This  has  led  to  the  study  of  low -cost  techniques  for  the  detection 
of  control  unit  errors.  Tov.  et  al^  designed  a  self -checking,  microprogrammed 
control  unit  based  on  a  combination  of  parity  checking,  bit  compare  and  inter¬ 
leaving.  ft  has  been  shown  that  'his  design  can  be  made  totally  self-checking 

with  respect  to  single  iaults.  Riaz^  showed  that  totally  self-checking  con¬ 
cepts  could  be  applied  to  synchronous  sequential  machines  (Moore  type)  in 
addition  to  tie  cou.iunai oria  circuits  considered  earlier  assuming  that  the 
clock  bne  is  Mull  -  tree  I  at -  r .  Ozguner14  developed  approaches  for  designing, 
totally  self-checking  asynchronous  sequential  machines.  Ho1"*  describes  the 
design  of  totally  sell -checking  computers  including  the  microprogrammed  con- 
>  rol  unit  .  This  machine  is  designed  to  halt  upon  the  detection  of  a  fault 
and  an  bt  instrumented  to  provide  fault  location  information  to  within  a 
tew  gate  levels.  Ashjaee  and  Reddy  describe  totally  self- checking  checkers 
tor  separable  codes  and  point  out  that,  for  certain  designs  and  separable  codes, 
the  corresponding  checkers  cannot  be  realized.  For  Type  I  checkers  (see 
Kcddv)  of  totally  sell  -checking  systems,  they  deline  sufficient  conditions 
on  separable  codes  that  insure  that  the  darker  can  be  realized. 

Fob  >r\un,v  oly  .  iims1  ot  the  work  is  based  on  a  single  stuck -at -one.1  or 
stuck  -at  -zero-error  mode!  lot  the  checker .  Consequently,  it  is  not  compatible 
with  an  LSI  implementation  of  the  Hunker.  However,  a  lower  level  of  integia- 
t  ton  implementation,  such  as  MSI  or  SSI.  would  probably  meet  the  single  error 
model.  Such  an  approach  would  complement  many  bit -slice  microprocessor 
control  units  since  they  are  usually  implemented  with  relatively  low-level 
integration  devices  combined  with  high  -speed,  highly  integrated  mem, >rv 
devices,  e.g.,  read-onh  memory  (ROM)  and  programmable  read-only 
memory  (PROM) . 


11  Fckert,  d.P.,  Weiner.  d.R.,  Welsh,  H.F.,  Mitchell,  H.F.,  "The  UNI  VAC 
System’’,  AIKK-1RK  Conf.  6-16,  1951  . 

12  Toy,  W.N.,  "Modular  LSI  Control  Logic  Design  With  Error  Dot  ect  ion"  . 
IEEK  TC-20(2),  1971,  pg.  161-162. 

13  Diaz,  M.  ,  "Design  of  Totally  Self-Checking  and  Fail  Safe  Sequential 
Machines",  Proc.  Fourth  Annual  International  Symposium  on  Fault 
Tolerant  Computing,  dune  1974,  pg.  3  19  -  3-24. 

14  Ozguner,  F.,  "Design  of  Totally  Self -Checking  Asynchronous  Sequent  ial 
Machines".  Coordinated  Science  Laboratory  Report  R-679,  Univ.  of 
Illinois,  Mav  1975. 

15  Ho.  D.S.,  "The  Resign  of  Totally  Self-Checking  Systems",  PhD  Thesis. 
Univ.  of  Illinois,  1976. 

16  Reddy,  d.  M . .  Ashjaee,  M.I.,  "On  Totally  Self-Checking  Checkers  for 
Separable  Codes”,  IFFF  TC,  Vol .  C-26,  No.  K,  August  1977. 


Perhaps  more  importantly,  none  of  the  coding  approaches  investigated 
resulted  in  low-cost  implementations.  When  all  the  factors  were  considered, 
the  implementation  costs  were  in  excess  of  duplication  and  introduce  additional 
delays  in  the  control  loop  which,  time-wise,  were  already  the  limiting  path. 

For  these  reasons,  replication  is  recommended  for  the  bit -slice  micro¬ 
processor  control  unit.  For  systems  that  require  essentially  uninterrupted 
processing,  a  form  of  triplication  is  recommended.  Assuming  that  the  LRU 
could  be  as  large  as  the  simplex  control  unit,  augmented  TMR  would  provide 
sufficient  error  location  resolution,  so  that  resource  costs  would  approximate 
that  of  triplication  plus  the  voters.  Speed  would  be  reduce  only  slightly 
due  to  the  added  delays  of  the  voters. 

Bit-slice  based  microprocessor  systems  that  can  tolerate  "short"  inter¬ 
ruptions  have  the  option  of  considering  either  replication  or  some  form  of 
coding.  However,  even  here,  replication  is  favored  because  real-time  error 
masking  of  error  correction  codes  is  not  needed  and  replication  provides  at 
least  one  identical  copv  of  the  unfailed  structure  after  the  occurrence  of 
the  error. 

For  monolithic  microprocessors,  the  control  unit  is  considered  as  a 
part  of  the  CPU  and  the  recommendations  made  in  the  processor  section  apply. 
Some  form  of  TMR  is  recommended. 

6.  Buses 

Bus  protection  depends  on  the  buses  and  interface  failure  modes.  Mill  - 
tary  standards,  such  as  MIL  STD  1553A,  and  the  environment  strongly  affect 
the  bus  protection  approaches  that  can  be  considered  for  a  particular  application. 
The  source  and  sink  of  the  information  transmitted  over  the  bus  also  strongly 
affect  the  protection  approach  because  if  the  code  employed  at  either  end  of 
the  bus  is  different  from  that  used  by  the  bus,  a  code  translator  may  be  re¬ 
quired.  If  all  three  employ  different  coding  schemes,  two  translateors  are 
required.  Depending  on  the  code  pair,  the  translator  may  be  quite  complex 
and  itself  require  protection.  It,  therefore,  behooves  the  system  designer 
to  utilize  as  few  different  codes  throughout  the  system  as  possible.  Where 
different  codes  are  required  they  should  be  selected  to  minimize  the  translator 
requirements,  and  vice  versa  if  two-way  communication  is  to  be  maintained 
across  the  bus. 

For  the  applications  considered  here,  where  the  systems  are  small  and 
the  number  of  devices  is  of  the  order  of  100  or  less,  the  processor  buses 
should  be  short  and  the  interface  requirements  should  dominate.  Thus,  the 
problem  can  be  viewed  as  just  an  extension  of  the  design  of  protected  logic. 

For  monolithic  microprocessors,  which  are  most  likelv  to  employ  some  version 
of  TMR,  the  processor  internal  buses  are  protected  by  TMR  when  the  functional 
units,  i.e.,  processor,  memory,  are  viewed  as  one  logic  unit:  the  buses  are 
indistinguishable  from  other  signal  paths. 

For  bit -slice  microprocessors,  the  buses  between  the  processor,  memory, 
and  control  can  be  viewed  as  indistinguishable  for  many  applications.  Thus, 
the  same  considerations  drive  the  bus  protection  problem  as  the  functional 
units  and  the  same  techniques  can  be  employed.  Since  the  control  and  processor 
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units  will  qui'o  likely  be  implemented  us  ini;  some  form  of  replication,  tlie 
nature  of  the  problem  will  be  out-  of  transitioning  between  various  forms  of 
duplication  and  triplication  The  memory  to  the  control  and  processor  bus 
problem  is  different  lor  ..  number  of  reasons.  First,  as  gist  indicated,  the1 
processor  and  control  will,  more  than  likely,  be  protected  by  some  form  of 
replication,  while  the  memon  will,  most  likely,  be  protected  by  some  conven¬ 
tional  code,  like  Hamming.  Second,  the  number  of  buses  can  vary  between  one 
and  three,  depending  on  whether  the  communication  is  split  between  an  input  or 
output  bus  and  or  .Pet  bet'  the  inis  is  segregated  by  function  between  address 
and  data.  Lastly,  the  treatment  of  the  control  signals  in  the  receiving  units 
can  van-  between  *•  becking  whet  her  the  signal  has  been  correctly  transmitted 
to  whether  the  signal  is  proper!',  received  and  returning  a  signal  to  the 
control  for  ivio  mi!  ggarui  r  >n ,  indicating  the  results  of  the  control  transmission  - 

error  non  error  . 


T.  Clock 

The  relinbit  generation  and  distribution  of  timing  signals  is  needed 
to  insure  proper  operation  >f  synchronous  sequential  circuits  in  self -diagnosing 
processors.  Terrors  on  these  timing  or  clock  lines  may  be  due  to  interconnection 
failures  or  malfunctions  m  the  source  of  the  signal.  For  errors  produced  bv 
malfunctions  in  'he  generation  mechanisms,  errors  can  be  classified  as  either 
catastrophic,  which  is  equivalent  to  a  stuck -at  fault  on  the  line,  or  variational 
as  m  a  t  requeue'-  change  •  Further  classification  has  been  established  based 
on  t he  circuit  used  io  mon-tor  the  line.  It  is: 

1)  Discrete,  charge  discharge  circuits 

2)  Ret riggerable  monostable  multivibrator  circuits 

3)  Digital  counter  circuits 

4)  Integrator  circuits 

None  of  the  schemes  can  check  the  input  for  all  possible  errors  without 
resorting  to  duplication  and  each  of  the  circuits  is  susceptible  to  undetect¬ 
able  internal  faults.  Usas*  <  describes  a  self-checking  periodic  signal  checker 
as  shown  in  Figure  U-l.  It  uses  the  same  hardware  as  a  duplication  scheme 
using  a  ret  riggerable  monostable  circuit,  but  it  is  self-testing  and  the  duplicated 
design  is  not.  Additionally,  this  checker  also  detects  duty  cycle  errors.  Since 
Ml  and  M2  are  arranged  to  run  180°  out -of -phase,  the  circuit  detects  uni¬ 
directional  errors  and  the  two  monostables  can  be  realized  in  a  single 
integrated  circuit  package  without  concern  for  failures  affecting  the  common 
power  and  ground  distribution  to  the  individual  circuits.  It  is  recommended, 
for  effective  error  detection,  that  the  checker  be  wired  to  a  memory  element 
or  clocked  module  following  the  last  fanout  branch.  This  permits  the  detection 
of  faults  on  anv  of  the  fanout  points  in  the  wiring  of  the  clock  line. 


17  Usas,  A.M  ..  "  The  Detection  of  Frrors  in  Periodic  Signals”,  Technical 
Note  «4b,  Stanford  University,  April  1974. 
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Figure  II-l.  Totally  Self -Cheeking  Periodic  Signal  Checker 


The  error  indicator  is  designed  to  complement  the  checker  and  provide  visual 
output  for  both  fixed  and  momentary  errors,  as  shown  in  Figure  II-2.  It  is  not 
self -testing  but  is  fault  secure  with  respect  to  all  faults  affecting  only  a  single 
flip-flop. 

Other  approaches  utilize  an  array  of  identical  oscillator  modules  to 
produce  a  number  of  phase-locked  clock  signals.  A  technique  commonly  used 
is  majority  vote  among  2f  +  1  redundant  signals,  which  produces  a  valid  output 
if  the  redundant  inputs  are  suitably  synchronized.  Dalv^  shows  that  the 
simple  majority  function  is  insufficient  and  that  "gliches"  or  sliver  pulses 
can  result  because  the  output  depends  on  the  failed  elements  during  part 
of  the  clock  period.  He  shows  that  by  incorporating  hysteresis  as  in-line 
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Figure  II -2.  Fail  Safe  Error  Indicator 


18  Daly,  W.M.,  "A  Fault  Tolerant  Digital  Clocking  System",  FTC  73, 
pg.  17-22. 
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receivers,  the  difficulty  can  be  <‘liminated  Figure  II  -  3  and  1 1 -4  show  one  of 
four  identical  elements  of  a  fault -tolerant  clock  system  module  that  will 
tolerate  the  failure  of  any  one  element  using  convent ional  TTL  integrated 
circuits.  The  circuitry  of  Figure  II - 3  is  designed  to  receive  and  vote  on  lour 
clock  outputs,  A,B,C,  and  D.  It  generates  the  majority  functions  and  produces 
pulse  outputs  corresponding  to  the  falling  and  leading  edges  of  the  majority 
function.  Z10  is  a  monostable  multivibrator  that  determines  the  cluck  frequency. 
Zll  is  a  one-shot  that  estaolishes  a  lower  limit  on  switching  time 

A  do  k  receiver  i  luster  sis  voter »  consists  of  the  topic  ot  Fig-m-  lid', 
plus  a  flip-flop  that  is  sit  by  DHSTA  and  reset  by  DSKTA.  The  1 1  ip  -  ( i<  ip  oalput 
is  the  desit  ed  synchronized  clock  signal  and  drivi  s  the  user  circuttr  .  . 

Another  intrinsic  clock  approach  employs  oscillator  standby  reduiid.mc y 
and  majority  voting  with  hysteresis .  Fa  eh  oscillator  contains  switching  cir¬ 
cuitry  to  select  one  of  the  three  oscillators  to  develop  Urn  redundant  clock  sig¬ 
nals  as  shown  m  Figure  li -a.  Operation  of  the  circuit  is  such  that  it  oscillator 
A  driving  the  majority  pates  fails,  the  detector  causes  its  latch  to  sd.  causing 
the  switchover  lopic  to  select  the  next  available  oscillator.  11  oscillator  lFs 
hitch  is  not  set .  it  provides  the  clock  sipnai  for  the  system.  It.  however,  B’s 
latch  is  also  set.  then  oscillator  C  is  selected  by  the  crossover  switch  to  drive 
the  majority  pates.  Bv  providing  external  control  of  the  latches,  switchover 
can  be  commanded  by  the  using  system  for  reasons  such  as  excessive  t requeues 
drift  and  testing . 

The  reliability  model  of  the  system  is  shown  in  Figure  il-b.  It  can  lie 
seen  that,  rattier  than  the  conventional  redundant  voted  approach  <  Figure  Il-6a> 
where  two  out  of  three  oscillators  are  required  for  success.  only  one  out  of 
three  is  required  i  Figure  1 1 -Ob).  Although  the  added  circuits  m  the  oscillator 
chain  slightly  increase  the  failure  rate  of  the  A  and  B  oscillator  channels,  the 
net  effect  is  increased  reliability  because  of  the  increased  tolerance  only  one 
oscillator  is  necessary  for  successful  operation. 

For  the  sell  diagnosing  applications,  a  combination  of  the  standby  re¬ 
dundant  oscillators  and  the  fault -tolerant  receiver,  similar  to  that  described 
<  Figure  11-4  plus  a  flip-flop  >  is  recommended  for  synchronization.  An  error 
indicator  circuit  should  lie  added  for  failed  synchronizer  error  location  indica¬ 
tion  . 


Figure  II - 3  .  Clock  Receiver  (Hysteresis  Voter) 
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